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Scientific  and  technical  report 
Final  Comprehensive  Report 

B.  Description  of  progress  made  against  milestones  during  the  reporting 
period 

Our  blood  biomarker  proposal  had  several  specific  objectives  and  aims,  listed  here: 
Aim  1 :  Develop  better  software  for  analyzing  dynamically  changing  transcriptomes. 
Aim  2:  Analyze  the  bloods  and  multiple  organs/tissues  of  animals  from  three  inbred 
mouse  strains  exposed  to  the  toxins  acetaminophen  and  carbon  tetrachloride  for 
transcriptomes,  proteins  and  miRNA  biomarkers. 

Aim  3:  Establish  MRM  mass  spectrometry  assays  for  at  least  25  liver-specific  blood 
proteins  based  on  the  acetaminophen,  CCL4,  and  other  model  systems  of  interest. 
Aim  4:  Analyze  the  bloods  and  tissues  of  animals  from  three  inbred  mouse  strains 
exposed  to  the  toxins  acetaminophen  and  carbon  tetrachloride  for  protein  biomarkers 
using  proteomics  technologies,  including  MRM. 

Aim  5:  Analyze  time  course  experiments  of  rat  tissues  and  blood  exposed  to  VX. 

Aim  6:  Develop  new  technologies  for  developing  protein-capture  agents  and  the 
analyses  of  single  protein  molecules. 


Summary 

At  ISB,  we  have  conducted  research  related  to  these  aims  over  the  past  several 
years.  Our  results  and  conclusions  are  presented  here.  The  discovery  and  validation  of 
organ-specific  biomarkers  are  challenging  from  a  technical  perspective.  Nonetheless, 
we  have  made  considerable  progress  and,  along  with  the  4  papers  already  published  on 
our  work  (see  below),  we  anticipate  that  at  least  four  additional  manuscripts  will  be 
written  and  submitted  for  publication  within  the  next  few  weeks  or  months.  Two 
invention  applications  have  been  filed  with  the  patent  office.  Planned  and  actual 
publications  and  invention  disclosures  are  listed  in  the  appropriate  research  descriptions. 

Progress  made  with  respect  to  the  specific  aims 

T ranscriptome  analysis  software 

Aim  1:  Develop  better  software  for  analyzing  dynamically  changing  transcriptomes. 

Two  approaches  were  taken  to  this  aim.  In  the  first  approach  (A),  a  standard 
RNASeq  analysis  pipeline  for  standard  transcriptomic  experiments  was  constructed  by 
Victor  Cassen.  A  second,  more  sophisticated  approach  conducted  by  Gustavo  Glusman 
(B),  describes  in  detail  how  best  to  normalize,  or  scale,  transcriptome  data. 

A.  Development  of  a  pipeline  for  processing  RNASeq  data 

RNA-Seq  is  a  technology  that  is  gaining  popularity  due  to  its  high  utility,  sensitivity, 
and  reliability,  while  simultaneously  benefitting  from  continually  reduced  costs. 

However,  with  the  rapid  rise  in  data  volumes  generated  by  RNASeq  and  other  high 
throughput  sequencing  technologies,  new  challenges  in  data  management  and  analysis 
have  arisen. 

The  RNASeq  pipeline  is  a  software  package  designed  to  aid  in  the  computational 
analysis  of  RNASeq  data.  Rather  than  embrace  a  single  approach,  the  pipeline  is 
designed  to  facilitate  the  running  of  multiple,  separately  written  analysis  software 
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programs  that  together  constitute  a  meaningful  analysis  of  the  data  at  hand.  This 
approach  offers  several  benefits  relating  to  flexibility: 

•  Different  projects  will  have  different  needs;  a  project  that  is  interested  primariiy  in 
identifying  differential  gene  expression  (e.g.,  to  quantify  networks)  may  not  be 
interested  in  novei  aiternative  splicing  discovery.  However,  various  steps  (e.g., 
fiitering)  may  still  be  common  to  both  processing  pipeiines.  The  RNASeq 
pipeiine  aiiows  the  user  to  easily  incorporate  common  software  steps  into  various 
pipelines,  abetting  software  re-use. 

•  As  analysis  algorithms  are  continuously  refined,  replaced,  and  superceded,  they 
can  be  swapped  in  and  out  of  the  RNASeq  pipeline  with  relative  ease. 
Additionally,  algorithms  can  easily  and  reproducibly  be  compared,  both  against 
themselves  (e.g.,  with  different  parameter  values)  or  against  other  algorithms. 
The  RNASeq  pipeline  incorporates  a  provenance  database  that  records  all  the 
information  needed  to  reproduce  a  pipeline  run,  as  well  as  performance  statistics. 

Features  of  the  software: 

•  A  collection  of  pre-configured  pipelines  and  pipeline  steps  to  handle  common 
RNASeq  processing  tasks. 

•  Support  for  a  variety  of  input  data  types,  including  single  and  paired  end  data. 

•  Sun  Grid  Engine  (SGE)  support  allows  pipelines  to  be  run  on  clusters  for 
enhanced  performance. 

•  Recording  of  all  pipeline  runs,  including  reproduction  and  performance  data. 

•  Pipeline  run  reports. 

Methodology: 

The  RNASeq  pipeline  works  by  combining  the  inputs  of  three  configuration  files  per 
run  to  produce  a  unix-based  shell  script  that  can  be  run  on  a  user’s  local  desktop  or 
submitted  to  an  SGE  cluster.  The  three  inputs  correspond  to  1)  a  description  of  the  read 
data  to  be  processed,  including  it’s  location  on  the  host's  filesystem;  2)  a  description  of 
the  pipeline  to  be  applied  to  the  data;  and  3)  a  site-wide  configuration  file  containing 
information  pertinent  to  host.  The  resulting  shell  script  incorporates  commands  for  each 
step  specified  in  the  pipeline's  configuration  file,  matching  the  outputs  of  various  steps  to 
the  inputs  of  dependent  steps  as  directed.  Interspersed  with  the  commands  for  each 
step  are  calls  to  RNASeq  pipeline-specific  commands  that  record  the  results  and 
execution  times  of  each  step  into  a  per-user  data,  which  is  then  later  used  to  generate 
run  reports. 

Documentation: 

A  webpage  describing  the  installation  and  use  of  the  pipeline  may  be 
found  at  http://vcassen.qdxbase.ora/RNA-SeQ 

B.  Analysis  of  digital  transcriptomes:  optimal  scaling  and  identification  of  tissue-specific 
genes  (Gustavo  Glusman  reporting) 

Conceptual  oven/iew:  We  have  developed  a  comprehensive  transcriptomics  analysis 
pipeline  focusing  on  digital  transcriptomics  data  (MPSS,  RNASseq).  The  analysis 
pipeline  links  a  series  of  computational  tasks,  as  shown  in  Figure  1 . 
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Figure  1.  Starting  from  the  expression  data  sets,  known  transcripts  and  the  genome  reference 
(red  boxes),  a  series  of  computational  tasks  (green  boxes)  lead  to  the  identification  of  tissue- 
specific  genes  (blue).  These  tasks  are  supported  by  a  variety  of  additional  computational 
procedures  (yellow). 

We  developed  a  variety  of  tools  for  performing  the  preliminary  data  management  and 
transformation  procedures.  Parsing  and  data  correction  methods  are  naturally 
technology-specific.  Proper  interpretation  of  the  observed  sequence  reads  via  mapping 
to  transcript  clusters  and  to  the  genome  is  crucial  for  avoiding  both  false  negatives  (by 
discarding  or  mismapping  reads)  and  false  positives  (by  mismapping  reads  or  by  giving 
credence  to  ambiguous  mapping  results).  Nevertheless,  while  these  potential  failures 
can  lead  to  errors  in  establishing  the  relative  expression  levels  of  different  genes,  they 
are  largely  consistent  when  evaluating  the  expression  level  of  each  gene,  in  different 
samples.  Conversely,  proper  performance  of  the  data  normalization  task  is  crucial  in 
preparation  for  cross-sample  comparisons  of  gene  expression  levels.  These  latter 
comparisons  are  the  core  and  central  concept  on  which  the  identification  of  tissue- 
specific  genes  is  based.  We  therefore  spent  significant  effort  studying  and  perfecting 
methods  for  accurate  data  normalization  via  scaling  algorithms. 

Methods  for  normalization:  In  contrast  to  methods  based  on  hybridization,  in  digital 
transcript  counting  the  observed  absolute  expression  level  of  each  gene  and  transcript 
will  depend  on  the  depth  to  which  each  sample  was  sequenced:  deeper  sequencing  will 
uncover  transcripts  expressed  at  very  low  levels,  and  will  proportionally  increase  the 
observed  expression  levels  of  more  prevalent  transcripts. 

Many  data  standardization  methods  have  been  proposed  to  date,  most  frequently  by 
scaling  the  values  observed  in  each  sample.  The  most  commonly  used  method 
normalizes  expression  values  to  the  total  number  of  reads  observed  in  each  sample: 
gene  expression  values  are  thus  expressed  in  terms  of  “counts  per  million"  (CPM)  or 
“transcripts  per  million”  (TPM);  for  RNASeq,  the  equivalent  measure  is  “reads  per  kb  per 
million"  (RPKM).  This  method  has  the  advantage  of  simplicity  as  samples  can  be 
normalized  independently,  but  its  results  are  sensitive  to  highly  expressed  genes:  since 
much  sequencing  output  is  spent  on  the  most  prevalent  genes,  the  presence  of  a  few, 
highly-expressed  tissue-specific  genes  can  significantly  lower  the  CPM  values  for  all 
other  genes,  often  leading  to  the  wrong  conclusion  that  the  latter  are  “down-regulated”. 
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A  way  to  avoid  such  distortions  is  to  normalize  gene  counts  in  terms  of  quantiles;  the 
median  expression  value  as  commonly  used  for  normalization  of  microarray  data.  Due  to 
the  preponderance  of  zero  and  low-count  genes  in  digital  transcriptomes,  the  median 
value  is  usually  uninformative,  but  expression  values  may  be  scaled  based  on  the  upper 
quartile.  Alternatively,  it  is  possible  to  adjust  the  overall  expression  levels  of  all  genes  so 
that  the  distributions  for  all  samples  become  equalized.  This  method  cancels  global 
biases  but,  since  it  does  not  rely  on  scaling,  it  distorts  the  pairwise  gene  expression 
ratios  within  each  sample. 

A  different  class  of  normalization  methods  relies  on  the  expression  level  of  a  subset 
of  the  genes  to  “guide”  the  normalization.  At  its  simplest  level,  one  may  assume  that  the 
expression  levels  of  certain  “housekeeping”  genes  (e.g.,  GAPDH,  HPRT)  are  constant 
across  cell  types,  and  can  therefore  be  used  individually  as  an  internal  normalization 
tool.  This  assumption  has  been  shown  to  be  invalid  in  various  scenarios,  leading  to 
incorrect  results.  More  elaborate  methods  therefore  rely  on  minimizing  the  variance  of 
not  just  one,  but  a  small  set  of  guide  genes,  or  consider  the  relative  RNA  production  of 
pairs  of  samples,  under  the  assumption  that  the  majority  of  genes  are  not  differentially 
expressed. 

We  implemented  a  large  number  of  existing  normalization  methods  and  some 
variations  on  them,  and  also  devised  entirely  different  approaches  to  data  normalization. 
We  encapsulated  all  these  methods  in  an  open-source,  portable  Perl  module  and  an 
equivalent  R  package. 

Classification  and  structure  of  normalization  methods:  We  identified  a  small  number  of 
core  concepts  on  which  the  many  normalization  methods  are  based.  With  the  single 
exception  of  the  Quantiie  Normalization  method,  ail  the  algorithms  we  considered  are 
giobai  procedures  that  scaie  ali  the  values  in  each  sample  using  a  single  “scaling  factor” 
(Figure  2,  next  page).  Some  of  the  scaling  methods  are  based  on  a  global  characteristic 
of  each  sample  (Fig.  2,  ieft),  i.e.,  they  study  each  sampie  independentiy  and  identify  a 
characteristic  vaiue  used  for  normaiization.  This  characteristic  value  can  be  the  sum  of 
all  counts  (CPM  method),  or  a  specific  expression  level,  e.g.,  at  the  upper  quartile.  We 
added  two  variations  on  these  methods:  using  total  counts  but  scaling  samples  relative 
to  each  other  (‘Total”  method),  and  scaling  to  the  upper  decile. 

Alternatively,  some  scaling  methods  are  based  on  pre-selected  genes  (Fig.  2,  right), 
either  using  the  expression  vaiue  of  a  singie  housekeeping  gene  to  guide  normalization, 
or  selecting  the  subset  of  housekeeping  genes  that  are  most  consistent  with  each  other 
(“most  stable”)  and  using  the  geometric  average  of  their  log-transformed  expression 
values  to  guide  normalization  (geNorm  method). 

Methods  of  the  third  class,  which  are  based  on  genes  selected  from  the  data  (Fig.  2, 
bottom),  start  by  identifying  a  (usuaily  large)  set  of  genes  expressed  in  the  samples  to  be 
compared,  and  then  use  various  combinations  of  these  genes  to  derive  the  scaiing 
factors  that  render  the  sampies  comparable.  The  TMM  algorithm  is  one  such  fully  data- 
driven  method,  based  on  pairwise  sample  comparison  of  “double  trimmed”  genes  (i.e., 
trimmed  first  by  absolute  expression  level  ranks  within  each  sample,  and  then  by 
expression  ratios  between  the  two  samples).  We  explored  a  variety  of  novel  methods 
that  use  single  trimming  (by  expression  level  ranks  within  each  sample)  and  that  scale 
all  samples  simultaneously.  In  particular,  we  created  the  novel  Network  Centrality 
Scaling  (NCS)  algorithm  that  uses  pairwise  gene  co-expression  as  a  simiiarity  metric, 
and  identifies  the  most  centrai  genes  in  the  resulting  network.  These  central  genes  are 
particularly  suited  to  serve  as  normalization  guides. 

Finally,  scaling  methods  can  be  devised  using  randomly  picked  values  (Fig.  2,  top).  In 
particular,  we  implemented  an  Evolution  Strategy  algorithm  that  stochastically  identifies 
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solutions  that  maximize,  as  objective  function,  the  number  of  genes  expressed  uniformly 
across  samples. 


Figure  2.  Taxonomy  of  normalization  methods. 


Even  when  based  on  very  different  concepts,  the  various  normalization  methods  may 
share  one  or  more  computational  procedures  in  common.  We  dissected  the  methods 
into  distinct  computational  steps  and  identified  shared  components  among  the  various 
algorithms.  We  organized  these  into  a  chart  resembling  a  map  of  bus  lines  (Figure  3), 
leading  from  the  raw  data  matrix  (green  “station”  in  Fig.  3)  to  one  of  three  possible 
endpoints  (red  “stations”).  In  this  visualization,  “stations”  represent  intermediary  results, 
and  the  lines  connecting  them  represent  computational  steps.  As  shown  in  the  figure, 
there  are  four  different  initial  actions:  1)  ignore  the  data  except  for  the  pre-determined 
housekeeping  genes;  2)  ignore  the  data  entirely  and  select  random  scaling  values;  3) 
use  the  data  solely  to  compute  the  total  expression  in  each  sample;  4)  sort  the  data 
matrix  in  preparation  for  a  variety  of  more  complex  computations. 

The  sorted  data  matrix  can  then  be  used  to  compute  rank-specific  averages  (for  the 
Quantile  Normalization  method),  to  identify  the  expression  levels  at  different  percentiles 
of  the  distribution  (for  the  upper  quartile  and  upper  decile  methods),  or  to  identify 
ubiquitous  genes.  Ubiquitous  genes  are  then  used  by  several  methods  in  diverse  ways. 

Except  for  the  CPM  and  Quantile  Normalization  methods,  all  the  algorithms  we 
describe  here  converge  on  a  set  of  relative  scaling  factors,  which  we  adjust  to  keep  the 
global  scale  of  the  data  set.  These  scaling  factors  can  be  computed  from:  1)  equal  gene 
weights,  2)  variable  gene  weights,  3)  whole-sample  weighted  means,  4)  a  target  value 
per  sample,  or  5)  random  values. 
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We  found  that  best  results  are  obtained  with  methods  involving  stochastic 
optimization,  and  with  certain  methods  based  on  analysis  of  ubiquitous  genes. 
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Figure  3.  Computational  pathways  for  normalization. 


Qualification  of  success  of  normalization  methods:  The  availability  of  many 
normalization  methods,  which  produce  different  results,  poses  the  more  complex 
question  of  how  to  assess  which  result  is  correct.  We  devised  three  different  ways  of 
evaluating  the  diversity  of  solutions  from  the  different  algorithms. 

1)  Maximization  of  uniform  genes.  We  defined  uniform  genes  as  those  with  very  similar 
expression  levels  across  all  samples  studied.  Mathematically,  we  used  a  stringent  upper 
cutoff  on  the  coefficient  of  variation.  Successful  normalization  methods  adjust  expression 
values  in  such  a  way  that  a  larger  number  of  genes  reach  uniformity.  Importantly,  this 
metric  (number  of  uniform  genes)  can  be  used  as  an  objective  function,  which  can  be 
optimized  using  stochastic  algorithms.  We  implemented  an  Evolution  Strategy  (ES)  to  do 
this.  The  ES  consistently  identifies  the  best  solutions  based  on  this  criterion  (Figure  4). 

2)  Maximal  decorrelation  of  sample  rankings.  We  found  that,  under  improper 
normalization,  different  genes  tend  to  rank  samples  (by  expression  levels)  similarly. 
Conversely,  more  successful  normalization  methods  minimize  this  correlation  (Figure  5). 
We  further  found  that  our  best  methods  bring  the  average  correlation  to  nearly  its 
theoretical  absolute  minimal  value  -  zero.  This  suggests  that  the  methods  are  not  just 
successful  -  they  approach  optimality. 
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Figure  4.  Number  of  genes  identified  as  specific  to  one  sample  vs.  number  of  genes  observed  to 
be  consistently  expressed  across  samples,  for  various  normalization  methods.  The  numbers  in 
the  orange  circles  denote  the  number  of  housekeeping  genes  combined  using  the  geNorm 
algorithm.  The  dashed  arrows  show  the  stochastic  path  of  the  ES  from  the  data  prior  to 
normalization  (wrhite  square,  “None")  to  the  best  approximation  to  the  optimal  solution  (gray 
square,  ES).  Brown  squares  represent  the  results  obtained  via  the  TMM  method,  using  each  of 
the  16  samples  as  reference. 


Figure  5.  Density  distribution  of  Spearman  correlations  of  sample  rankings  for  some  normalization 
methods. 
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3)  Similarity  among  solutions.  While  many  methods  produce  very  different  solutions  to 
the  normalization  problem,  a  subset  of  them  produce  similar  results  (Figure  6).  These 
methods  are  conceptually  and  computationally  very  different  from  each  other,  and  yet 
they  converge  on  a  common  solution.  This  is  also  the  solution  that  maximizes  uniform 
genes  and  maximally  decorrelates  samples,  lending  it  strong  credence. 


Figure  6.  Comparison  between  the  scaling  factors  suggested  by  the  different  methods.  Lower  left: 
the  resulting  scaling  factors  for  the  heart  sample.  Upper  right:  Pairwise  correlations  between  the 
methods,  for  all  samples.  Red  shades  denote  high  correlation  values  (above  0.75),  blue  denotes 
low  correlation  (or  anticorrelation).  The  column  to  the  right  indicates  the  number  of  uniform  genes 
identified  by  the  method.  The  Quantile  Normalization  method  is  not  included  in  this  analysis  since 
it  does  not  produce  scaling  factors. 

Analysis  of  tissue  specificity:  We  analyzed  the  resulting  gene  expression  values 
following  normalization  with  various  protocols.  We  used  a  few  definitions  to  identify 
uniform,  specific,  enriched  and  depleted  genes: 

1)  Background  noise  level.  We  considered  genes  with  a  median  expression  value  under 
1 0  reads  to  be  at  noise  level. 

2)  Variation  bandwidth.  We  considered  a  two-fold  change  in  expression  (up  or  down 
from  the  median)  to  be  within  “acceptable  limits",  beyond  which  a  gene  may  be 
differentially  expressed. 

3)  For  the  purpose  of  this  analysis,  we  considered  genes  as  “uniform”  if  their  median 
level  is  above  background,  and  all  samples  were  within  the  variation  bandwidth  around 
the  mean. 

4)  We  considered  a  gene  to  be  “specific”  to  a  sample  (or  to  two)  if  its  median  expression 
is  within  noise  level,  but  above  the  variation  bandwidth  for  the  single  sample  (or  both 
samples). 

5)  We  similarly  considered  a  gene  to  be  “enriched”  to  a  sample  (or  to  two)  if  its  median 
expression  is  above  noise  level,  and  above  the  variation  bandwidth  for  the  single  sample 
(or  both  samples). 

6)  We  considered  a  gene  to  be  “depleted”  in  a  sample  if  its  median  expression  is  above 
noise  level,  but  below  the  variation  bandwidth  for  the  single  sample  (or  both  samples). 

In  this  example  comparison  (Table  1)  between  raw  data  (no  normalization), 
normalization  to  total  counts,  and  the  network  centrality  scaling  as  an  approximation  to 
the  optimal  scaling  solution,  we  see  that  successful  scaling  yields  much  larger  numbers 
of  uniform  genes,  as  expected. 
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Genes 

Raw 

Total/CPM 

NCS  1 

Uniform 

513 

1002 

1559 

Specific  (to  1) 

3790 

3723 

3648 

Specific  (to  2) 

2139 

2224 

2185  1 

Enriched  (in  1) 

474 

842 

1323 

Enriched  (in  2) 

256 

418 

628 

Depleted  (in  1) 

848 

1020 

824 

Depleted  (in  2) 

661 

521 

349  1 

Table  1.  Uniform,  specific,  enriched  and  depleted 

genes  for  the  raw  data,  or  after  normalization  to 

total’  counts  {equivalent  to  counts  per  million)  or  Network  Centrality  Scaling. 

If  a  gene  is  not  observed  (or  observed  at  noise  levels)  in  any  sample  except  for  one, 
it  will  be  trivially  identified  as  specific  to  that  sample.  Such  determination  is  largely 
unaffected  by  normalization,  and  indeed  the  number  of  “specific”  genes  changes  little. 
When  the  expression  level  in  the  specific  sample  is  low,  there  is  room  for  a  false-positive 
determination  of  specificity.  Proper  normalization  may  correct  this,  yielding  somewhat 
fewer  specific  genes,  as  observed. 

When  the  expression  level  in  most  samples  is  significantly  above  noise  level  but  one 
sample  (or  a  few)  have  significantly  higher  expression  levels,  the  gene  is  also 
specifically  expressed  in  that  sample.  We  call  this  “enriched”  in  that  sample,  for 
simplicity.  Identifying  “enriched”  genes  is  not  trivial,  and  strongly  susceptible  to 
normalization.  The  number  of  tissue-enriched  genes  identified  by  the  optimal  scaling 
methods  is  much  larger  than  by  simpler  (e.g.,  CPM)  methods  or  in  the  raw  data.  This  is  a 
beneficial  result  that  was  not  imposed  by  the  normalization  methods  themselves,  and  we 
count  this  as  additional  evidence  for  successful  normalization. 

We  explored  also  the  symmetric  situation,  in  which  just  one  sample  shows  a 
significantly  lower  level  than  the  rest  (“depleted”  genes).  This  situation  can  reflect  a  true 
biological  effect,  though  infrequently  so.  It  is  much  more  common  to  observe  this 
situation  as  a  result  of  improper  normalization,  e.g.,  when  normalizing  by  CPM,  a  highly 
expressed  tissue-specific  gene  may  cause  other  gene  expression  values  to  be 
downgraded,  resulting  in  their  apparent  “depletion”. 

Publications:  The  work  described  above  is  being  prepared  for  publication. 

“Optimal  Scaling  of  Digital  Transcriptomes”  Gustavo  Glusman,  Max  Robinson,  Burak 
Kutlu,  Juan  Caballero  and  Leroy  Hood.  An  earlier  version  of  this  manuscript  has 
received  DOD  approval. 

Software:  The  software  for  these  analyses  is  under  continuous  development,  and 
can  be  found  here:  http://db.svstemsbioloav.net/Qestalt/normalizer/ 

Biomarker  studies  on  the  mouse  model  system 

Aim  2:  Analyze  the  bloods  and  multiple  organs/tissues  of  animals  from  three  inbred 
mouse  strains  exposed  to  the  toxins  acetaminophen  and  carbon  tetrachloride  for 
transcriptomes,  proteins  and  miRNA  biomarkers. 

Aim  3:  Establish  MRM  mass  spectrometry  assays  for  at  least  25  liver-specific  blood 
proteins  based  on  the  acetaminophen,  CCL4,  and  other  model  systems  of  interest. 

Aim  4:  Analyze  the  bloods  and  tissues  of  animals  from  three  inbred  mouse  strains 
exposed  to  the  toxins  acetaminophen  and  carbon  tetrachloride  for  protein  biomarkers 


-11- 


using  proteomics  technologies,  including  MRM. 

Because  these  three  aims  are  all  interconnected,  we  report  progress  for  all  of  them 
together,  first  describing  work  done  in  proteomics,  and  then  work  done  on 
transcriptomes. 

Several  ISB  researchers  worked  on  these  aims,  with  the  result  that  multiple  proteins 
identified  as  being  likely  to  be  primarily  liver-specific  biomarkers  were  identified  in  the 
blood  using  various  mass  spec  and  antibody-oriented  technologies.  Two  sets  of  in  vivo 
mouse  experiments  were  done  using  acetaminophen  perturbation.  In  one  of  these  sets 
(described  in  A,  B  and  D  below),  time  course  measurements  were  analyzed  on  each 
animal  separately,  revealing  the  wide  range  among  individual  animals  in  terms  of  their 
response,  even  though  we  used  an  inbred  strain  (results  are.  In  the  other  set  (described 
below  in  C),  which  was  done  first  to  get  a  lay  of  the  land,  animal  samples  collected  at 
various  time  points  were  pooled. 

The  first  report  on  these  three  aims  is  from  Shizhen  Qin,  who  describes  the 
experimental  methods  and  results,  with  a  focus  on  using  MRM  mass  spectrometry  to 
identify  biomarkers  indicative  of  acetaminophen  and  CCL4  exposure  damage.  The 
second  report  is  from  Bingyun  Sun,  who  has  examined  protein  abundance  changes  in 
several  organs  after  exposure  to  acetaminophen.  The  third  report  comes  from  Chris 
Lausted,  Zhiyuan  Hu,  and  Hyuntae  Yoo  who  did  the  initial  biomarker  survey  experiments 
with  pooled  mouse  samples.  Finally,  Kai  Wang  reports  his  work  analyzing  the  effects  of 
acetaminophen  treatment  on  the  mouse  liver  transcriptome. 

A.  Mouse  model  system  and  MRM  results  (Shizhen  Qin  reporting). 

Introduction.  In  the  assessment  of  liver  damage  by  drugs  and  chemicals,  the 
determination  of  enzyme  levels  such  as  ALT  (alanine  transaminase)  and  AST  (aspartate 
transaminase)  is  largely  used.  Necrosis  or  membrane  damage  releases  the  enzymes 
into  circulation;  therefore,  they  can  be  measured  in  blood.  AST  (ASTI  and  AST2)  are 
mainly  distributed  in  heart,  muscle,  brain,  liver  and  kidney.  Any  damage  of  these  tissues 
will  result  in  releasing  of  AST  protein  into  blood  stream  and  is  therefore  not  highly  liver 
specific.  For  example,  following  a  myocardial  infarction,  serum  levels  of  AST  are 
elevated  and  reach  a  peak  48  to  60  hours  after  onset.  ALT  1  is  about  3-4  fold  more 
enriched  in  liver  followed  by  kidney,  heart,  muscle  pancreas  and  lung  and  is  more  liver 
specific  than  AST.  However,  ALT2  is  similar  to  AST  that  is,  not  enriched  in  liver  more 
than  in  many  other  tissues  and  is  mainly  found  in  muscle,  liver,  heart,  pancreas,  prostate 
and  spinal  cord.  In  laboratory  blood  tests  employing  colorimetric  and  ultraviolet  catalytic 
enzymatic  reactions  for  the  quantitation  of  ALT  and  AST,  both  ALT  and  ALT  isoenzymes 
1  and  2  are  detected  due  to  lack  of  ability  to  measure  each  isoenzyme  activity 
separately.  In  addition  to  problems  with  liver  specificity,  half-lifes  of  both  enzymes  in  the 
blood  stream  are  short,  ALT  and  AST  activities  only  present  in  blood  for  a  short  period  of 
time  after  liver  injuries  occurred.  New,  better  liver  injury  markers  are  needed.  In  this 
project,  we  employed  known  liver  toxins  acetaminophen  (APAP)  and  carbon 
tetrachloride  (CCI4)  in  mouse  models  with  liver-specific  proteins  as  targets  and  mass 
spectrometry  technology  SRM  (selected  reaction  monitoring)  for  quantitative  proteomics 
to  discover  new  potential  markers  for  liver  injuries  induced  by  these  two  types  of 
chemicals. 

Brief  description  of  materials  and  methods: 

1.  Animals,  drug  treatment  and  plasma  preparation 
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In  the  pilot  studies,  we  tried  three  inbred  mouse  strains:  C57BL/6J,  A/J  and  SJL  from  the 
Jackson  Laboratory  and  found  that  all  three  strains  responded  to  APAP  and  CCI4  liver 
toxicities  similarly  as  measured  by  the  elevations  of  blood  ALT  and  AST  levels.  The 
fourth  mouse  strain  tested,  NOD/ShiLtJ,  proved  more  sensitive  than  the  three  mouse 
strains  tested  to  both  APAP  and  CCI4  toxicity  as  judged  by  earlier  responsive  time  and 
higher  ALT  and  AST  levels  in  the  blood  after  treatment.  Therefore,  C57BL/6J  (B6)  and 
NOD/ShiLtJ  (NOD)  were  used  for  the  full  tests.  (Mouse  blood  and  tissue  samples  were 
used  for  the  analyses  described  below) 

Female  B6  and  NOD  mice  8  weeks  of  age  were  injected  intraperitoneally  (IP)  with 
375  mg/kg  of  acetaminophen  dissolved  with  PBS  or  1  ml/kg  CCI4  diluted  1:10  with  sterile 
Corn  oil.  Control  animals  received  the  same  volume  of  PBS  (acetaminophen  controls)  or 
corn  oil  (CCI4  controls)  at  each  time  points.  Blood  was  drawn  at  3,  8,  12,  24,  48,  72,  96 
hours  after  injection  for  SRM  analyses.  For  western  blot  analyses,  time  points  were 
extended  to  120, 144  and  168  hours.  Mice  receiving  APAP  or  PBS  were  fasted  for  24 
hours  before  injection  (fasting  started  from  9:30am).  No  fasting  was  done  for  CCI4 
treated  and  corn  oil  controls.  The  number  of  mice  treated  at  each  time  point  is 
summarized  in  Table  2. 


No 

treatment 

3h 

8h 

24h 

48h 

72h 

96h 

120h 

144h 

168h 

APAP/B6 

3 

5 

5 

10 

9* 

3 

3 

6* 

6* 

5* 

PBS/B6 

3 

3 

3 

3 

3 

3 

2 

2 

2 

APAP/NOD 

3 

5 

10 

5 

5 

7 

5 

8 

8 

5* 

PBS/NOD 

3 

3 

3 

3 

3 

3 

2 

2 

2 

CCI4/B6 

3 

3 

3 

2* 

3 

3 

3 

4 

4 

4 

Corn  oil/B6 

3 

3 

3 

3 

3 

3 

2 

2 

2 

CCI4/NOD 

3 

3 

3 

3 

3 

3 

3 

4 

4 

4 

Corn 

oil/NOD 

3 

3 

3 

3 

3 

3 

2 

2 

2 

Table  2.  Num 

3er  of  mice  treated  at  each  time  point.  Controls  were  treated 

with  PBS  (for  APAP) 

and  com  oil  (for  CCI4).  Due  to  drug  treatment  related  death,  mouse  number  labeled  with  * 
indicates  number  of  survivals  at  that  time  point. 

Blood  samples  were  drawn  by  Cardiac  Puncture.  Typically,  400ul  blood  was  obtained 
from  each  mouse.  Plasma  was  prepared  flowing  Tammen’s  method  (see  references  at 
end  of  this  section).  Plasma  samples  were  stored  at  -80°C  without  proteinase  inhibitors. 

All  work  for  this  project  involving  live  animals  was  conducted  under  Institutional 
Animal  Care  and  Use  Committee  (lACUC)  approved  protocols  (10-00  series).  ISB  has 
an  Assurance  on  file  with  the  Office  of  Laboratory  Animal  Welfare  (OLAW  Assurance 
#A4355-01 )  and  is  accredited  by  the  Association  for  Assessment  and  Accreditation  of 
Laboratory  Animal  Care  (AAALAC  Accreditation  #001363).  All  animal  works  were 
performed  in  our  Pathogen  Free  (SPF)  vivarium  facility. 

2.  Quantitate  AST,  ALT  levels  in  treated  and  control  mouse  plasma 

Plasma  ALT  and  AST  levels  were  determined  colorimetricly  by  using  ALT,  AST  reagent 
kits  following  the  manufacturer’s  instructions  (TECO  Diagnostics,  Anaheim,  CA). 
Specimens  were  analyzed  on  the  day  of  collection.  Duplicated  measurements  were 
performed. 

3.  Blood  Acetaminophen  concentration  measurement  after  injection 

Blood  acetaminophen  concentrations  were  measured  in  plasma  with  an  acetaminophen 
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Elisa  kit  with  less  than  2%  cross  reactivity  of  other  compounds  such  as  procainamide 
(Neogen  Corporation,  Lexington,  KY).  All  tests  were  performed  following  the  instruction 
manual  provided. 

4.  Plasma  sample  preparation  for  SRM 

To  reduce  the  complexity  of  proteins  in  the  plasma  samples,  the  top  14  highly  abundant 
proteins  were  depleted.  Depleted  plasma  were  digested  with  trypsin  and  desalted  with 
Oasis  MCX  cartridges  (Waters,  Milford,  MA). 

5.  Preparation  of  the  liver-specific  and  liver-enriched  proteins  list 

We  used  a  targeted  approach  focusing  on  organ-specific  proteins  to  increase  the 
likelihood  of  identifying  protein  biomarkers  in  blood  that  may  reflect  pathology  of  a 
particular  organ.  Our  list  of  liver-specific  or  liver-enriched  proteins  (liver  proteins)  was 
created  by  analyzing  tissue-specificity  in  RNA  datasets  and  by  performing  an  organ- 
specific  protein  search  with  Gene  Atlas  Interface  analysis.  The  mouse  databases 
searched  against  were  3  datasets  from  NCBI-GEO  (Gene  Expression  Omnibus)  with  a 
total  of  179  mouse  tissues.  Due  to  the  similarities  of  mice  and  human  genome,  human 
proteins  that  were  identified  by  us  (Qin  et  al,  2012  in  preparation)  as  liver-specific  or 
liver-enriched  were  also  included  in  the  mouse  liver  protein  list. 

6.  Peptide  selection  from  the  liver  protein  list 

The  mouse  liver  protein  list  contains  165  proteins  of  which  131  were  previously  detected 
by  mass  spectrometry.  A  total  of  547  peptides  suitable  for  SRM  analyses  were  selected 
from  1 16  of  the  131  previously  detected  liver  proteins.  Two  to  three  peptides  were 
selected  from  each  protein  following  peptide  selection  criteria  for  SRM  (Lange,  et  al., 
2008).  Peptides  previously  identified  in  PeptideAtlas  (Deutsch  et  al.,  2008)  were 
preferentially  chosen. 

All  peptides  used  in  this  study  were  checked  by  BLAT  at  http://aenome.ucsc.edu/  and 
Protein  BLAST  fhttp://blast.ncbi.nlm.nih.aov/Blast.caii  searches  to  ensure  that  they  are 
unique  to  the  target  protein  at  both  proteomic  and  genomic  levels.  Finally,  the 
uniqueness  of  every  Q1/Q3  pair  from  the  target  peptides  were  confirmed  by  an  SRM 
theoretical  collision  calculator  tool  (http://proteomicsresource.washington.edu/cgi- 
bin/srmcalc.cgi) 

7.  Monitoring  liver-specific  proteins  in  blood  by  SRM 

All  SRM  analyses  were  performed  on  an  Agilent  6460A  triple  quadrupole  (QQQ)  mass 
spectrometer  with  a  ChipCube  nanoelectrospray  ionization  source  coupled  with  an 
Agilent  1200  nanoFlow  HPLC  system. 

7.1  Detection  of  endogenous  peptides  in  pooled  plasma  samples  from  control  and 
treated  animals 

From  131  liver  proteins  that  were  previously  detected  by  Mass  Spec,  547  peptides  were 
selected  from  1 16  proteins  that  meet  the  peptide  selection  standards.  We  performed 
endogenous  peptide  (all-light  peptide  as  opposed  to  heavy  peptide  standard)  SRM 
analyses  in  an  effort  to  find  out  how  many  of  the  native  proteins  and  their  peptides  could 
be  detected  in  control  and  treated  mouse  plasma  samples  before  heavy  peptide 
standards  were  purchased.  From  this  all-light  test,  we  have  detected  199  peptides 
representing  81  proteins.  Crude  unpurified  peptide  standards  (heavy  peptides)  that 
correspond  to  the  detected  natural  counterparts  (light  peptides)  were  synthesized  with 
heavy  isotopic  lysine  (13C615N2)  or  arginine  (13C615N4)  at  the  C-termini  (Sigma- 
Aldrich,  USA). 
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7.2  Collision  energy  optimization  and  heavy  peptide  titration 

Collision  energies  (CE)  determined  using  the  default  formula  from  Agilent  were  further 
optimized  with  4  additional  CE  steps  (±5V,  ±10V).  The  best  precursors  and  each  of  their 
4  transitions  under  optimized  conditions  were  selected  (Figure  7).  Detected  heavy 
peptides  were  titrated  at  6  concentrations  in  a  normal  human  serum  background  to  build 
a  titration  curve  and  to  determine  the  proper  amount  of  each  peptide  standard  to  spike-in 
(Figure  8). 

7.3  Full  SRM  tests 

Based  on  titration  curves,  a  proper  amount  of  heavy  peptides  were  spiked  in  to  each 
plasma  samples  to  reach  a  UH  ratio  within  ±  10  fold  in  most  cases.  One  to  two  mice 
(mouse  #1  and  2)  from  each  control  group  at  all  time  points  were  analyzed  by  SRM.  For 
CCI4  treated  mice,  two  treated  mice  (mouse  #1  and  2)  were  analyzed  at  each  lime  point 
in  both  NOD  and  B6  strain.  As  for  APAP  treated  mice,  due  to  the  great  response 
variations  to  the  treatment  at  different  time  points  after  treatment,  4  mice  and  3  mice 
were  analyzed  at  8  hour  and  24  hour  time  point,  respectively  for  APAP/NOD  and  4  mice 
and  5  mice  were  analyzed  at  24  hour  and  48  hour  time  point,  respectively  for  APAP/B6. 
Duplicated  runs  were  performed  for  each  sample. 

8.  SRM  data  analysis 

All  SRM  data  were  processed  using  the  Skyline  Targeted  Proteomics  Environment 
(v1 .1)  (MacLean  et  al.,  2010).  All  data  were  manually  inspected  to  ensure  correct  peak 
detection  and  accurate  integration.  Peptides  with  at  least  3-fold  signal-to-noise  ratio 
were  considered  detectable.  The  total  peak  area  and  Light/Heavy  ratio  of  each  peptide 
were  exported  for  statistical  analysis. 

9.  Statistic  analysis 

All  other  analyses  including  calculation  and  graphics  were  generated  by  scripts  written 
by  computational  biologists  at  ISB  or  by  Prism  5  graphics  software  (GraphPad  software, 
La  Jolla,  CA,  USA). 

1 0.  Western  blot  analysis 

To  investigate  the  detectability  of  potential  new  marker  proteins  in  plasma  by  Western 
blotting,  plasma  (not  depleted)  samples  from  treated  and  control  mice  were  loaded  on 
protein  gels  and  proteins  in  each  sample  separated.  The  proteins  were  transferred  to 
PVDF  membrane  and  probed  with  primary  antibodies,  washed  and  incubated  with  HRP- 
conjugated  secondary  antibodies.  Detections  were  carried  out  and  the  images  were 
analyzed  using  Imaged. 

Results: 

1.  Mouse  liver  injuries  induced  by  IP  injection  of  APAP  and  CCI4  measured  by  ALT 
and  AST  values 

Responses  to  the  treatment  of  drugs  in  mice  were  judged  by  the  increase  of  ALT  and 
AST  enzyme  activity  in  the  plasma.  To  CCI4  treated  mice.  Variations  between  individual 
animals  in  both  B6  and  NOD  strains  were  small.  However,  we  observed  huge  individual 
differences  in  terms  of  drug  responsiveness  to  APAP  injection  in  both  mouse  strains  at 
the  peak  responsive  time  points.  In  all  cases,  we  noticed  AST  elevations  and  ALT 
elevations  paralleled  each  other  and  therefore  only  ALT  values  are  cited  throughout  this 
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study.  The  ALT  values  after  treatment  in  both  strains  by  the  two  drugs  were  summarized 
in  Figure  7. 


Bk  A  24k  4tk  72k  Ak 


Figure  7.  Huge  variation  among  mice  at  the  peak  responsive  time  points  indicated  by  plasma  ALT 
levels  were  observed  after  APAP  treatment  in  both  B6  (A)  and  NOD  (B)  mice.  Uniformed  results 
were  obtained  after  CCI4  treatment  in  both  B6  (C)  and  NOD  (D)  strains.  Black  smooth  curve 
lines:  average  ALT  levels  after  treatment  with  STDEV  error  bars. 


Blood  APAP  concentration  tests  performed  in  plasma  at  3  and  8  hour  time  points  after 
drug  administration  in  both  NOD  and  B6  mice  shown  consistent  acetaminophen 
concentrations  in  blood  of  highly  responsive,  low  responsive  and  non-responsive 
animals.  This  fact  indicates  that  the  inter-individual  differences  among  animals  to  APAP 
treatment  is  not  due  to  incorrect  introduction  of  the  drug  into  the  animal  systems  (Table 
3). 


Sample 

name* 

APAP  plasma  concentration 
(lig/ml)  at  3  hour  after  injection** 

ALT  level  (lU/ml) 

NOD/3h 1 

98 

53 

NOD/3h 2 

91 

126 

NOD/3h 3 

111 

66 

NOD/3h 4 

107 

95 

NOD/3h 5 

89 

212 

B6/3h 1 

51 

102 

B6/3h 2 

52 

280 

B6/3h 3 

40 

264 

B6/3h 4 

33 

1242 

B6/3h_5 

39 

910 

Table  3.  No  association  of  plasma  APAP  concentration  to  liver  toxicity  indicated  by  ALT 
levels  3  hours  after  injection  in  both  mouse  strains.  ‘Sample  name:  StrainAime 
point_mouse  number.  “  At  8  hours  after  injection,  all  mouse  plasma  APAP 
concentrations  fall  to  zero  in  both  strains. 
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2.  Identification  of  liver  proteins 

Using  strategies  described  above,  by  mining  the  gene  expression  data,  we  identified  165 
mouse  liver  proteins.  All  these  proteins’  human  counterparts  have  passed  GeneCards  (a 
human  only  database)  verification. 

3.  Endogenous  proteins  detected  by  SRM  and  confirmed  by  heavy  peptide 
standards 

After  suitable  peptides  and  transitions  for  each  liver  protein  were  selected,  we  used 
pooled  control  and  treated  mouse  plasma  to  determine  how  many  liver  proteins  can  be 
detected  by  SRM.  Eventually,  we  have  detected  142  peptides  derived  from  81  liver 
proteins  by  SRM  and  all  of  them  were  confirmed  by  synthetic  corresponding  heavy 
peptides  spiked-in. 

4.  CE  optimization 

Collision  energy  optimization  was  performed  as  described  in  Methods.  Duplicated  runs 
were  performed  for  each  CE.  A  total  of  284  charge  2  and  charge  3  precursors  from  142 
peptides  were  optimized.  Best  transitions  and  collision  energies  were  used  for  SRM  full 
tests. 

5.  Consistency  and  accuracy  of  SRM  data 

5.1  Duplicate  SRM  runs  are  well  correlated 

Duplicate  runs  were  performed  for  each  sample;  technical  variations  between  the  two 
runs  were  small  as  exemplified  with  two  runs  of  peptide  FVEGLPINDFSR  in  the 
CCI4/NOD  study  (Fig  8A).  In  general,  Pearson  tests  showed  good  correlations  between 
runs  (>  0.9  in  most  cases). 

5.2  Protein  levels  measured  by  multiple  features  are  consistent 

When  a  protein  level  is  measured  by  more  than  one  peptide,  close  agreement  in 
quantification  was  observed.  This  observation  gave  us  reasonable  confidence  that 
protein  levels  in  samples  estimated  from  a  single  peptide  from  a  given  protein  can  be 
reliable  (Figure  8B). 
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Figure  8A.  MRM  analysis.  Variations  between  technical  replicates  were  generally  small  with  a 
Pearson  correlation  value  >0.9  between  duplicate  runs,  plasma  protein  levels  of  measured  by 
peptide  FVEGLPINDFSR  in  the  CCI4/NOD  test  were  consistent.  T  run_1,  treated  first  run;  T 
run_2,  treated  second  run;  C  run_1  control  first  run;  C  run_2,  control  second  run.  8B.  Multiple 
peptides  derived  from  the  same  protein  performed  consistently  in  most  SRM  tests.  As  shown 
here,  protein  plasma  levels  measured  by  two  different  peptides  selected  from  the  same  protein 
were  in  close  agreement.  Relative  protein  levels  in  plasma  are  indicated  by  normalized 
light/heavy  peptide  ratios. 


6.  Thirty  informative  proteins  were  found  to  distinguish  controls  and  APAP  or 
CCI4  treated  mice 

From  the  81  proteins  tested  by  SRM  in  this  study,  we  identified  30  proteins  (known  liver 
markers  AST  1  and  2  included)  that  are  able  to  separate  drug  treated  mice  from  their 
controls.  These  30  proteins  fall  into  4  categories:  1)  As  good:  protein  levels  higher  in 
treated  animals  that  are  basically  as  significant  as  ALT  and  AST  levels  measured  with 
the  colorimetric  enzymatic  reactions  and  by  SRM  (ASTI  and  AST2);  2)  Worse:  proteins 
levels  are  higher  in  treated  animals  but  the  differences  are  less  significant  than  detected 
by  ALT  and  AST;  3)  Potentially  better:  protein  levels  higher  in  treated  animals  that  are 
potentially  better  than  ALT  and  AST  as  liver  injury  markers;  and  4)  Leverl  down:  protein 
levels  that  are  lower  in  treated  mice  than  in  controls.  The  results  are  summarized  in  table 
4. 
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Non- 

informative 

Informative 

Potentially 

better 

As  good 

Worse 

Level 

down 

51(63%)  ''8(10%)  13(16°/.) 

5  (6%)  ‘  4  (5%) 

Protein  1 

Protein  4 

Protein  2 

Protein  5 

Protein  54 

Protein  3 

Protein  6 

Protein  17 

Protein  31 

Protein  67 

Protein  7 

Protein  1 5 

Protein  18 

Protein  38 

Protein  75 

Protein  8 

Protein  1 6 

Protein  33 

Protein  53 

Protein  81 

Protein  9 

Protein  19 

Protein  37 

Protein  56 

Protein  10 

Protein  24 

Protein  46 

Protein  1 1 

Protein  45 

Protein  48 

Protein  12 

Protein  50 

Protein  52 

Protein  13 

Protein  62 

Protein  14 

Protein  64 

Protein  20 

Protein  71 

Protein  21 

Protein  79 

Protein  22 

Protein  80 

Protein  23 

Protein  25 

Protein  26 

Protein  27 

Protein  28 

Protein  29 

Protein  30 

Protein  32 

Protein  34 

Protein  35 

Protein  36 

Protein  39 

Protein  40 

Protein  41 

Protein  42 

Protein  43 

Protein  44 

Protein  47 

Protein  49 

Protein  51 

Protein  55 
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Protein  57 

Protein  58 

Protein  59 

Protein  60 

Protein  61 

Protein  63 

Protein  65 

Protein  66 

Protein  68 

Protein  69 

Protein  70 

Protein  72 

Protein  73 

Protein  74 

Protein  76 

Protein  77 

Protein  78 

Table  4.  Results  of  SRM  analyses  of  the  01  liver  proteins  searched  for  liver  injury  markers. 
Existing  liver  marker  AST  (AST  1  and  2)  are  included  as  informative  proteins  in  the  “as  good” 
category.  Protein  names  are  denoted  as  Protein  1  to  Protein  81  before  publication  (protein  names 
will  be  provided  upon  request  by  tX)D).  “Better*,  “as  good"  and  “worst"  are  terms  to  describe  the 
new  markers  as  compared  to  the  classic  liver  markers  ALT  and  AST.  “level  down”  markers  have 
lower  plasma  protein  concentrations  in  treated  animals  than  in  controls. 

6.1  Eight  proteins  are  shown  potentially  better  as  liver  injury  markers  in  our 
animal  models  than  the  existing  liver  marker  ALT  and  AST 

The  power  of  the  strategy  of  using  liver  specific  proteins  as  targets  combined  with  SRM 
technology  was  demonstrated  in  this  study  by  revealing  30  proteins  that  are  informative 
to  liver  injuries  caused  by  chemical  exposures.  More  significantly,  we  have  found  8 
proteins  that  performed  better  in  at  least  one  drug/strain  combination  in  this  study  and 
may  potentially  serve  as  better  markers  than  ALT  and  AST  for  liver  injuries. 

6.2  Five  proteins  are  confirmed  significantly  better  than  ALT  and  AST 

Out  of  these  8  proteins,  we  are  particularly  interested  in  five:  Protein  45,  Protein  19, 
Protein  6,  Protein  4  and  Protein  16.  Protein  45  is  interesting  because  its  extended 
presence  in  plasma  widened  the  detectable  period  of  liver  injuries  caused  by  CCI4  in 
NOD  mice  from  a  narrow  window  (sharp  peak  at  24  hours  after  treatment)  to  24-168 
hour  after  drug  injection.  This  elongated  damage-detectable  period  is  also  true  in  the 
other  mouse  strain/drug  combination  after  CCI4  treatment. 

Alternatively,  Protein  19  levels  in  plasma  start  to  show  increase  at  3  hours  and  reach 
peak  value  8  hours  after  treatment  and  this  protein  acts  as  a  better  marker  for  early 
detection  of  CCI4  caused  injuries.  A  combination  of  Protein  19  and  Protein  45  will  cover 
the  detectable  period  of  CCI4  induced  liver  injuries  in  NOD  mice  from  3  hours  to  at  least 
168  hours  after  drug  treatment.  Similar  results  were  obtained  for  CCI4  induced  injuries  in 
B6  mice  except  that  the  peak  responsive  time  point  to  the  toxic  effect  of  CCI4  was  at  48 
hour  after  treatment.  On  the  contrary,  ALT  and  AST  are  indicative  mainly  at  24  hours  for 
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NOD  mice  after  treatment.  The  results  of  Protein  19  and  Protein  45  as  markers  for  CCI4 
induced  liver  injuries  in  NOD  mice  are  summarized  in  Figure  9. 
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Figure  9.  Combination  of  protein  19  and  Protein  45  are  superior  to  ALT  and  AST  in  delecting 
liver  injuries  induced  by  CCI4  in  NOD  mice.  A)  SRM  and  enzymatic  reaction  data  at  time  points 
after  exposure  to  CCI4.  ASTI  and  AST2  levels  in  plasma  elevated  mainly  at  24  and  48h  time 
points  as  measured  by  SRM  and  at  24h  time  points  by  enzymatic  reactions.  B)  Protein  45  levels 
are  significantly  higher  from  24  to  96h  (measured  by  two  Protein  45  peptides  in  SRM  analysis) 
and  Western  blot  analysis  is  consistent  with  SRM  results.  By  using  Protein  45  as  liver  injury 
marker,  the  detectable  period  is  extended  to  at  least  168  hours  after  treatment.  C)  Protein  1 9 
levels  were  detected  early  at  3h  after  treatment  with  CCI4  and  were  still  highly  detectable  at  48h 
time  point  (measured  by  SRM  with  two  Protein  19  peptides  and  by  Western  blot).  D)  top:  Western 
blot  image  of  ASTI.  AST1  protein  band  shown  mainly  at  24h  time  point  after  treatment.  Middle: 
Western  blot  image  of  ALT  protein.  Bottom:  Western  blot  image  shown  a  combination  of  Protein 
19  and  Protein  45  successfully  detected  liver  injuries  from  3  to  168  hours  after  CCI4  exposure. 
Similar  results  observed  in  B6  mice  treated  with  CCI4.  T1:  treated  mouse  1;  T2:  treated  mouse  2; 
C:  control  mouse  1  at  each  time  point.  In  all  cases,  mouse  1  at  each  time  point  was  used  for 
Western  blotting. 


In  acetaminophen-treated  NOD  mice,  the  peak  responsive  time  point  is  at  8  hours 
after  treatment.  At  the  3-hour  time  point,  test  results  of  both  mouse  3h1  (3-hour  time 
point  mouse  number  1 ,  and  so  on)  and  3h2  ALT  enzyme  failed  to  sho\w  any  level 
increase  although  pathologic  microvesicular  changes  are  clearly  shown.  However, 
Protein  4  levels  measured  by  SRM  demonstrated  significant  increase  at  the  3  hour  time 
point.  Similarly,  at  the  96  hour  time  point,  mice  96h4  showed  no  increase  of  ALT  level. 
However,  the  increases  of  Protein  45  levels  for  corresponding  mice  were  significant. 
More  impressively,  western  blot  results  of  ASTI,  AST2  and  ALT  only  revealed  a  protein 
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band  at  the  24  hour  time  point  for  ALT,  8  and  24  hour  time  points  for  AST  1 ,  and  24,  72 
hour  time  points  for  AST2.  Alternatively,  Western  blot  results  also  demonstrated  that 
Protein  45  is  a  better  marker  than  ALT  and  AST  at  the  corresponding  24,  48,  72,  96  time 
points  with  clear  Protein  45  bands  shown  in  each  lane.  The  results  of  enzymatic  reaction 
of  ALT,  SRM  results  for  Protein  4,  Protein  6,  Protein  16,  and  Protein  45  in  APAP  treated 
NOD  mice  with  corresponding  Western  blot  gel  results  are  summarized  in  Table  5. 


Time  points 

No 

treat 

3h 

2 

8h 

2 

24 

h4 

48h 

2 

72h 

4 

96 

h4 

120 

h2 

144 

h2 

168 

h2 

Protein/peptide 

Enzyme/SRM/fold  changes* 

ALT  enzyme 

1 

1 

11 

8 

43 

12 

24 

1 

1 

1 

1 

Protein  4** 

1 

4 

71 

4 

3 

1 

1 

NT 

NT 

NT 

Protein  6 

1 

1 

66 

43 

51 

121 

1 

NT 

NT 

NT 

Protein  45** 

1 

1 

10 

23 

9 

557 

115 

0 

4 

NT 

NT 

NT 

Protein  16** 

1 

1 

26 

6 

29 

148 

185 

1 

NT 

NT 

NT 

Proteins 

Western  blot  bands*** 

GPT 

ASTI 

AST2 

± 

Protein  4 

NT 

NT 

NT 

NT 

NT 

NT 

NT 

NT 

NT 

Protein  6 

± 

a 

B 

D 

D 

Protein  45 

1 

1 

D 

a 

D 

0 

± 

± 

Protein  16 

± 

± 

1 

D 

± 

□ 

Table  5.  Protein  level  changes  listed  in  table  3  are  from  mice  that  are  highly  responsive  to  drug 
treatment  at  each  time  point.  ALT  protein  levels  in  plasma  were  measured  by  enzymatic  reactions 
(lU/L),  Protein  4,  Protein  6,  Protein  16  and  Protein  45  levels  were  measured  by  SRM  {UH  ratios). 
Protein  level  fold  changes  were  calculated  by  treated  level/control  level  at  each  time  point.  NT: 


not  tested. 

*  Less  than  >2  fold  change  of  protein  level  is  considered  as  fold  of  change  =  1 
**  Protein  in  controls  not  detected.  Used  highest  noise  as  control  value  to  calculate  fold  of 
changes.  Fold  changes  are  relative.  Fold  changes  in  these  cases  are  actually  infinite. 

“*  ±,  weak  band;  0,  strong  band;  blank,  no  band.  Protein  4  was  not  tested  by  Western  blotting 
due  to  luck  of  good  antibody. 


Summary  and  discussion:  We  adopted  a  liver-specific  protein  based  strategy  for  finding 
liver  injury  biomarker  discovery  in  blood.  The  approach  is  centered  on  the  idea  that 
concentration  of  organ-specific  proteins  in  the  blood  can  be  used  to  monitor  the  status  of 
a  specific  organ  because  changes  in  blood  concentrations  reflect  the  normal  as  opposed 
to  toxin-exposed  and  disease-perturbed  status  of  their  cognate  biological  networks.  We 
mined  transcriptomic  databases  to  identify  organ-specific  proteins  and  created  a  165 
liver  protein  list.  We  used  a  mass  spectrometry  based  SRM  technology  to  selectively 
target  liver-specific  proteins  in  the  blood  from  control  and  drug  treated  animals.  We  have 
established  a  workflow  as  illustrated  in  Figure  1 0.  By  using  this  SRM  targeting  liver 
protein  strategy,  we  have  assayed  81  liver-specific  blood  proteins  based  on  the 
acetaminophen,  CCL4  model  systems  in  two  mouse  strains. 
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Figure  10.  Workflow  employed  in  this  project  for  finding  new  liver  injury  biomarkers. 


One  of  the  greatest  challenges  in  analyzing  the  plasma  proteome  is  the  complexity 
and  extremely  wide  dynamic  range  of  concentration  of  different  proteins  in  the  blood. 
Blood  chemistry  is  the  ultimate  window  into  one's  health.  The  human  plasma  proteome 
holds  the  promise  of  a  revolution  in  disease  diagnosis  and  therapeutic  monitoring. 
Sampling  blood  is  one  of  the  least  invasive  methods  of  biological  sample  collection. 
However,  blood  contains  tens  of  thousands  of  different  proteins  from  all  tissues  plus 
numerous  distinct  immunoglobulin  sequences  with  an  extraordinary  concentration 
dynamic  range  in  12  orders  of  magnitude.  The  extreme  complexity  and  dynamic  range  of 
blood  proteins  made  detection  of  individual  protein  of  interest  very  difficult.  To  address 
this  challenge,  we  adopted  SRM  technology  to  selectively  monitor  the  limited  number  of 
proteins  of  interest  (81  liver  proteins)  instead  of  randomly  sampling  a  small  portion  of  all 
blood  proteins  as  by  the  mass  spectrometry  shotgun  method.  SRM  can  target  specific 
peptides  and  proteins  like  antibodies  without  spending  time  and  money  to  develop  them, 
can  be  highly  sensitive  and  can  be  high  throughput  and  can  be  multiplexed  to  monitor 
>50  proteins  in  a  single  run.  When  used  in  combination  with  isotopically  labeled 
standards  it  can  also  quantify  levels  of  corresponding  endogenous  proteins  in  biologic 
samples.  To  effectively  reduce  the  complexity  of  blood  proteins,  we  used  immunoaffinity 
columns  to  remove  the  most  abundant  proteins  from  the  samples  to  allow  for  study  of 
our  less-abundant  liver  protein  targets.  The  14  most  abundant  proteins  compose  90%  of 
the  total  blood  protein.  Removal  of  the  abundant  protein  resulted  in  a  10-fold  enrichment 
of  proteins  of  interest  in  the  samples  and  an  increase  detection  sensitivity.  We  observed 
significant  sample-to-sample  variations  with  the  Seppro®  lgY14  spin  column  in  a 
separate  study.  The  adoption  of  AKTA  FPLC  system  coupled  with  a  Seppro®  lgY14  LC5 
depletion  column  greatly  improved  the  reproducibility  of  sample  preparation  (Qin  et  al, 
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2012).  The  sensitivity  of  the  tests  was  further  increased  by  careful  optimization  of 
collision  energy  and  selection  of  the  best-fit  peptides  and  transitions.  One  of  the  key 
steps  in  our  workflow  that  may  have  contributed  to  the  success  of  finding  new  markers 
for  liver  injuries  is  that  we  performed  pilot  studies  to  investigate  how  many  endogenous 
proteins  can  be  detected  under  our  SRM  conditions  not  only  in  control  plasma  samples, 
but  also  in  treated  samples.  If  only  the  controls  were  used  to  reveal  the  detectability  of 
liver  proteins  in  plasma  by  SRM  and  monitoring  these  proteins  for  drug  treated  full  tests, 
many  targets  would  be  missed  because  most  of  the  promising  new  markers  were 
present  only  in  the  circulating  blood  stream  after  drug  induced  liver  damage  accrued  but 
not  in  untreated  mice. 

n  The  power  of  this  approach  has  been  demonstrated  from  this  study  in  which  we 
have  found  30  liver  proteins  that  can  be  used  to  monitor  liver  injuries  after  chemical  or 
drug  exposure  in  mice.  In  addition,  5  proteins  showed  strong  evidence  that  they  might  be 
potentially  better  than  the  long  existing  liver  biomarkers  ALT  and  AST. 

Unlike  AST  and  ALT  enzymatic  reactions,  these  5  new  protein  markers  can  be 
integrated  into  antibody-  or  synthetic  capture  agent-based  microfluidic  chips  (Integrated 
Blood-Barcode  Chips),  devices  that  have  the  potential  to  analyze  large  numbers  of 
patient  samples  rapidly  (in  a  few  minutes),  inexpensively,  and  in  a  highly  multiplexed 
format  (100s  or  even  1000s  of  different  assays  investigating  many  different  diseases) 
employing  blood  from  a  pinprick. 

Finding  biomarkers  has  never  been  easy.  Despite  the  fact  that  billions  of  dollars  have 
been  invested  by  big  pharma,  private  investors  and  government  grants,  one  new 
biomarker  has  been  discovered  each  year  on  average.  We  have  discovered  and 
confirmed  5  proteins  are  superior  to  ALT  and  AST  in  mouse  models.  We  believe  that  it 
would  be  a  great  waste  if  we  stop  here.  We  hope  that  we  would  have  additional  support 
to  investigate  these  potential  new  liver  markers  in  human  subjects  and  explore  the 
possibilities  of  conducting  clinic  trials  with  these  new  markers. 

References:  Manuscript  in  preparation;  Qin,  S.,  Gray,  L.,  etal.  Finding  better  blood 
markers  for  acute  liver  injuries  induced  by  Acetaminophen  and  Carbon  Tetrachloride. 
This  manuscript  has  not  yet  been  submitted  to  the  DOD  for  approval. 

Deutsch,  E.  W.,  Lam,  H.,  Aebersold,  R.,  PeptideAtlas:  a  resource  for  target  selection 
for  emerging  targeted  proteomics  workflows.  EMBO  Rep.  2008,  9,  429-434. 

Lange,  V.,  Picotti,  P.,  Domon,  B.,  Aebersold,  R.,  Selected  reaction  monitoring  for 
quantitative  proteomics:  a  tutorial.  Mol  Syst  Biol.  2008,  4,  222. 

MacLean,  B.,  Tomazela,  D.  M.,  Shulman,  N.,  Chambers,  M.,  etal.,  Skyline:  an  open 
source  document  editor  for  creating  and  analyzing  targeted  proteomics  experiments. 
Bioinformatics  2010,  26,  966-968. 

Qin  S,  Zhou  Y,  Lok  AS,  Tsodikov  A,  Yan  X,  Gray  L,  Yuan  M,  Moritz  RL,  Galas  D, 
Omenn  GS  and  Hood  L  SRM  targeted  proteomics  in  search  for  biomarkers  of  HCV- 
induced  progression  of  fibrosis  to  cirrhosis  in  HALT-C  patients.  Proteomics  2012, 12,  1- 
9 

Tammen  H.  Specimen  collection  and  handling;  Standardization  of  blood  sample 
collection.in  Methods  in  Molecular  Biology  vol.  428:  Clinical  Proteomics:  Methods  and 
Protocols,  Edited  by  A  Vlahou.  p35. 


B.  Analysis  of  multiple  organ  response  to  acetaminophen  exposure  (Bingyun  Sun 
reporting) 
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Introduction:  Blood  is  an  ideal  window  for  us  to  check  disease  and  health  status.  Using 
mouse  as  an  animal  model  for  acetaminophen-induced  toxicity  study,  1  assessed 
acetaminophen-overdose  multi-organ  responses  by  analyzing  the  blood  toxicoproteome. 
Tissue/organ  specific  protein  signatures  can  leak  or  secrete  into  blood  stream  due  to 
acetaminophen  (APAP)  toxicity.  Clinical  observation  and  animal  studies  have  both 
indicated  that  acetaminophen  idiosyncratic  toxicity  can  impair  multiple  tissues  and 
organs.  To  comprehensively  examine  such  global  responses,  I  developed  a  sensitive 
blood  proteomics  strategy  as  shown  in  Figure  1 1  that  used  immunodepletion  to  remove 
an  abundant  blood  protein,  albumin,  a  special  N-terminal  labeling  technique  for 
quantification  purpose  and  a  glycopeptide-capture  method  to  fractionate  blood  proteome 
for  non-glycopeptides  and  glycopeptides  with  glycans  removed.  All  these  efforts 
simplified  blood  protein  complexity,  and  improved  mass  spectrometry  (MS)  identification 
and  quantification  accuracy.  In  the  end,  we  successfully  identified  multiple  organ 
responses  of  APAP  in  our  mouse  model,  including  kidney,  heart,  muscle,  bone  marrow, 
brain,  intestine,  and  adipose  tissues.  Both  Western  blotting  and  targeted  MS  analyses 
were  carried  out  and  we  successfully  validated  a  list  of  organ  specific  proteins  that  can 
be  used  as  location  markers  for  disease,  especially  for  toxicity  diagnosis.  Our  results 
have  agreed  with  previous  knowledge  about  APAP  toxicity  collected  through  human  and 
animal  studies  using  techniques  such  as  histopathology  and  radio-isotope  tracing;  and 
this  is  the  first  time  to  comprehensively  discover  multi-organ  responses  through  a  blood 
proteomics  effort. 


Serum  (Control)  Serum  (APAP) 
Albumin  depletion 
Trypsin  digestion 
N-isotag  labeling 


H 

Glycopeptide  capture 
Glycopeptides  Non-glycopeptides 
Strong  cation  exchange  fractionation 

t  * 

rp-LC- MS/MS 


Figure  11.  Illustration  of  the  gagQP  serum  proteomics  strategy,  in  which  serum  depleted  albumin 
is  subjected  to  denaturation  and  trypsin  digestion,  followed  by  labeling,  glycopeptide-capture  and 
reverse  phase  LC-MS/MS  analysis. 

Results:  The  major  proteomics  technical  improvements  for  this  study  include  first  using  a 
new  N-isotag  labeling  reagent  as  shown  in  Figure  12  below.  This  new  reagent  increases 
MS  identification  efficiency  of  the  labeled  peptides;  and  the  large  mass  shift  between 
quantifiable  biological  samples  created  by  this  labeling  reagent  improves  the  flexibility  of 
choosing  the  appropriate  mass  spectrometers  for  quantification.  Secondly  we  adapted  a 
glycopeptide  capture  method  as  a  fractionation  scheme  here  to  simplify  blood  proteome 
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complexity  and  to  remove  glycan  interference  to  MS  identification  of  peptides  as 
illustrated  in  Figure  11. 


t-Boc-Leu-NHS  (15N,13C) 


Isotag -Peptide-R  H-L=7  Da 

Isotag- Peptide- K  H-L=14  Da 

Isotag 

lsotag=Leu 

Figure  12.  Chemical  structure  of  the  heavy  formed  (13C  and  15N)  N-isotag  and  its  modification  to 
peptides. 

Using  this  strategy,  we  identified  a  list  of  organ-specific  protein  markers  in  blood  and 
many  of  them  showed  responses  to  acetaminophen  toxicity  as  summarized  in  Table  6. 


Tissue 

Gene  symbol 

MS 

quantification 

changes 

Bone 

Lcpl* 

3.8 

Marrow 

Ltf 

2.9 

Pip* 

21 

Brain 

Ptgds 

2.8 

Fat 

Mel 

8.7 

Heart/Muscle 

Mb* 

0.8 

Muscle 

Ckm 

2.1 

Kidney 

Selenbp2(SBP)* 

6.4 

Mdhl 

6.4 

Sardh 

8.S 

Umod 

S.3 

Skin 

Car6 

2.3 

Intestine 

Apoa4 

2.1 

Fabpl 

204 

Ppmla 

2.2 

Table  6.  Protein  candidates  identified  for  APAP  overdose  induced  extra-hepatic  responses 
derived  from  global  MS  survey,  *  highlights  the  proteins  having  been  validated  by  Western  as 
shown  in  Figure  13. 
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We  validated  both  the  organ  specificity  of  the  discovered  protein  markers  as  well  as  their 
responses  in  toxicity  in  blood.  The  Western  blotting  validation  results  are  summarized  in 
Figure  13.  Our  quantitative  MS  results  also  agreed  well  with  observation  made  through 
other  characterization  approaches  such  as  the  use  of  surface  plasma  resonance  array  to 
measure  protein  concentration  in  blood  as  shown  in  Table  7. 
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Figure  13.  A:  Western  blot  validation  of  mouse  tissue  specificity  and  response  to  drug  in  control 
and  treated  mouse  sera  of  marker  proteins  identified  through  blood  toxicoproteomics.  B:  Western 
validation  of  selected  protein  marker  responses  in  mouse  sera  as  a  function  of  time  after  LD  50 
APAP  treatment. 
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# 

Symbol 

Toxicoprottomk^  | 

i 

1 

SPR 

'  ! 

CPSl 

A 

A 

i 

^  '■ 

ASt 

A 

A 

CAT 

A 

A 

4 

GPTI  (SGPT) 

^  A  ’ 

A 

oofTifsacnr) 

A 

A 

6 

GLUDI 

A 

A 

7 

mm 

A 

A 

8 

FAH 

A 

A 

9 

MSTTA 

A 

A 

10 

Ahcy 

A 

A 

11 

AHhir 

A 

A 

12 

assT 

A 

A 

f3 

IWI 

A 

A 

XT 

Ftcd 

‘  ^  ! 

1  A 

15 

AOIOB 

A 

16 

HPD  j 

A 

17 

■HMT 

A 

i8 

HGD 

A 

Table  7.  Liver-specific  proteins  identified  as  differentially  expressed  in  blood  in  response  to  APAP 
challenge  by  our  toxicoproteomics  and  surface  plasma  resonance  (SPR)  techniques.  Up  and 
down  solid  triangle  symbols  indicate  increased  and  decreased  serum  concentrations, 
respectively. 

This  proteomics  effort  for  the  first  time  to  our  knowledge  demonstrates  the  strength  of 
using  high  throughput  MS  protein  analysis  to  comprehensively  and  sensitively  identify 
global  responses  to  drug  toxicity.  The  organ  specificity  of  the  identified  mouse  protein 
markers  has  also  been  tested  on  human  orthologous  proteins.  Many  of  these  proteins 
carry  the  same  tissue/organ  signature  property,  thereby  the  validated  organ  marker 
proteins  can  be  used  directly  to  indicate  the  organ  location  of  human  diseases. 

Publications:  A  manuscript  is  being  prepared  and  will  be  submitted  to  DOD  for  approval. 

Glycocapture-Assisted  Global  Quantitative  Proteomics  (gagQP)  Reveals  Multiorgan 
Response  in  Blood  Toxicoproteome.  Bingyun  Sun,  Jeffrey  A.  Ranish,  Angelita  G.  Utleg, 
Zhiyuan  Hu,  Andrew  Keller,  Shizhen  Qin,  Cynthia  Lorang,  Li  Gray,  Amy  Brightman, 
Denis  Lee,  Vinita  M.  Alexander,  Robert  L.  Moritz,  Leroy  Hood* 

C.  Summary  of  the  pooled  mice  biomarker  study  (Christopher  Lausted,  Zhiyuan  Hu, 
Hyuntae  Yoo  reporting). 
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Introduction:  Over  the  course  of  this  project,  we  utilized  a  mouse  model  and  in  vitro 
models  to  study  liver  injury,  analyzed  these  biological  models  using  novel  antibody 
microarrays  (see  Aim  6  below)  as  well  as  immunoblotting  and  mass  spectrometry, 
discovered  fifteen  new  potential  blood  biomarkers,  and  developed  novel  antibody-based 
assays  for  these  markers. 

Biomarker  Discovery.  Using  pooled  mouse  samples  in  a  protocol  following  a  time 
course  after  exposure  to  acetaminophen,  fifteen  liver-specific  blood  proteins  were 
identified  as  markers  of  acetaminophen  (APAP)-induced  hepatotoxicity  using  three 
proteomic  technologies:  label-free  antibody  microarrays,  quantitative  immunoblotting, 
and  targeted  iTRAQ  mass  spectrometry.  Liver-specific  blood  proteins  produced  a 
toxicity  signature  of  eleven  elevated  and  four  attenuated  blood  protein  levels.  These 
blood  protein  perturbations  begin  to  provide  a  systems  view  of  key  mechanistic  features 
of  APAP-induced  liver  injury  relating  to  glutathione  and  S-adenosyl-L-methionine  (SAMe) 
depletion,  mitochondrial  dysfunction,  and  liver  responses  to  the  stress.  Two  markers, 
elevated  membrane-bound  catechol-O-methyltransferase  (MB-COMT)  and  attenuated 
retinol  binding  protein  4  {RBP4),  report  hepatic  injury  significantly  earlier  than  the  current 
gold  standard  liver  biomarker  metric,  alanine  transaminase  (ALT).  These  biomarkers 
were  perturbed  prior  to  onset  of  irreversible  liver  injury.  Five  of  these  mouse  liver- 
specific  blood  markers  had  human  orthologs  that  were  also  found  to  be  responsive  to 
human  hepatotoxicity.  These  proteins  appear  along  with  conventional  biomarkers  and 
non-organ  specific  potential  biomarkers  in  Figure  14. 
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Figure  14.  Potential  biomarkers  observed.  A)  Using  label-free  antibody  array  (SPRI),  Western 
blot  (WB),  and  iTRAQ  (MS)  methods,  24  protein  level  changes  were  observed  in  mouse  plasma. 
Four  proteins  are  conventional  liver  function  biomarkers  and  20  are  potentially  novel  biomarkers. 
Increased  (-),  decreased  (' )  and  unchanged  («)  measurements  are  indicated  by  arrow  directions. 
B)  Quantitative  plasma  profiles  for  four  secretory  proteins.  C)  Quantitative  plasma  profiles  of  ALT 
and  novel  biomarkers.  ALT  indicates  injury  at  3  hours,  peak  at  12  hours,  and  return  to  baseline  at 
96  hours.  ALT  levels  here  were  assayed  by  enzymatic  activity  and  reported  in  lU/mL.  CQMT  and 
CPS1  have  apparently  different  patterns  from  ALT.  D)  Quantitative  liver  lysate  profiles  of  the 
novel  blood  biomarkers.  Lysate  profiles  differed  greatly  from  the  plasma  profiles.  The  increase  in 
CQMT  levels  was  shifted  later  in  time,  while  the  other  protein  levels  are  attenuated  between  24 
and  48  hours. 
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Our  in  vivo  experimental  approach  entailed  injecting  C57BL/6  mice  with  half-lethal 
dose  of  APAP  and  then  studying  the  dynamic  change  of  liver-specific  protein 
concentrations  in  liver  and  blood.  About  30%  of  the  mice  died  between  24  and  48  hours. 
Histological  staining  of  the  sliced  tissue  showed  progressive  necrosis  increasing  to  a 
maximum  at  24  hours  post-injection.  In  surviving  mice,  the  liver  histopathology  returned 
to  normal  between  72-192  hours.  Blood  ALT/AST  levels  indicated  clear  injury  starting  at 
3  hours  post-injection,  with  maximal  blood  ALT/AST  values  (>10,000  lU/L)  occurring 
between  12-24  hours,  then  gradually  decreasing  to  normal  levels.  Including  previously 
identified  blood  proteins,  24  proteins  were  reserved  to  correlate  with  injury  using  SPRI 
microarrays,  immunoblotting,  and  mass  spectrometry. 

Publications  and  inventions:  This  work  has  been  submitted  for  publication  (manuscript 
draft  approved  by  the  DOD).  “Blood  protein  signature  for  hepatotoxicity — systems 
strategy  for  organ-specific  biomarker  discovery.”  Zhiyuan  Hu,  Christopher  Lausted, 
Xiaowei  Van,  Hyuntae  Yoo,  Amy  Brightman,  and  Leroy  Hood.  (Submitted  to  Molecular 
and  Cellular  Proteomics.) 

A  patent  application  has  also  been  filed  in  the  USA  and  Europe. 

•  United  States  Patent  Application  Serial  Number  12/785,279  entitled  NEW 
BIOMARKERS  FOR  LIVER  INJURY,  filed  21  May  2010,  assigned  to  the  Institute 
for  Systems  Biology 

•  PCT  /US201 0/035829  entitled  NEW  BIOMARKERS  FOR  LIVER  INJURY,  filed 
21  May  2010,  assigned  to  the  Institute  for  Systems  Biology. 


D.  Analysis  of  the  transcriptome  after  acetaminophen  exposure  (Kai  Wang  reporting) 

Analysis  ofmRNA  in  liver.  We  conducted  detailed  studies  on  the  effects  of 
acetaminophen  overdose  on  the  transcriptome  in  mouse  liver.  Like  earlier  reports,  most 
of  metabolism-related  biological  processes  including  pathways  associated  with  lipid, 
amino  acid,  carbohydrate,  and  nucleotide  metabolisms  are  all  significantly  suppressed  in 
the  liver  while  various  cell  proliferation  and  cell  signaling  pathways  were  enhanced  after 
acetaminophen  overdose  (Table  8). 

One  of  the  key  players  in  neutralizing  the  toxic  acetaminophen  metabolite,  N-acetyl- 
p-benzoquinoneimine  is  glutathione.  In  contrast  to  transcripts  involved  in  other 
metabolism  related  pathways,  the  levels  of  transcripts  encoding  enzymes  involved  in 
glutathione  metabolism,  such  as  various  glutathione  S-transferases,  glutathione 
reductase  and  glutathione  synthetase  are  all  significantly  induced  after  acetaminophen 
exposure.  This  also  indicated  by  the  strong  enrichment  of  glutathione  metabolism 
pathway  associated  with  up-regulated  genes  after  acetaminophen  overdose  (Table  8). 
This  suggests  a  compensation  mechanism  in  the  cells  trying  to  replenish  glutathione 
levels  in  order  to  metabolize  acetaminophen  and  its  toxic  metabolite. 

The  ability  to  construct  biologically  meaningful  gene  networks  and  modules  from 
transcriptome  studies  is  critical  for  contemporary  systems  biology.  We  have  devised  a 
method.  Semantic  Similarity-Integrated  approach  for  Modularization  (SSIM)  that 
integrates  various  gene-gene  pairwise  similarity  values,  including  information  obtained 
from  gene  expression,  protein-protein  interactions  and  GO  annotations,  in  the 
construction  of  modules  using  affinity  propagation  clustering.  In  comparison  with 
previously  reported  algorithms,  modules  identified  by  SSIM  showed  significantly  stronger 
association  with  biological  functions.  Specifically,  SSIM  is  effective  in  identifying 
coherent  functional  modules  in  which  genes  are  highly  co-expressed,  interconnected  via 
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protein-protein  interactions  and  functionally  similar  in  terms  of  GO  annotations  (Figure 
15).  The  SSIM  approach  can  also  reveal  the  hierarchical  structure  of  gene  modules  to 
gain  a  broader  functional  view  of  the  biological  system.  Hence,  the  proposed  method 
can  facilitate  comprehensive  and  in-depth  analysis  of  high  throughput  experimental  data 
at  the  gene  network  level. 
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Table  8  Molecular  pathways  in  liver  affected  by  acetaminophen  overdose.  The  p  values  were  calculated 
by  Database  for  Annotation,  Visualization  and  Integrated  Discovery 
(http://david.  abce.  nei  fcrf.  gov/home.j  sp). 
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Figure  15.  Comparison  between  modules  identified  by  ICMg  and  SSIM.  One  of  the  modules 
identified  by  ICMg  method  (ICMg  module)  shares  a  number  of  genes  with  two  separate  modules 
identified  by  SSIM  (SSIM  module  A  and  B).  Expression  profiles  of  genes  in  three  modules  are 
shown  on  the  left  and  protein-protein  interaction  network  are  shown  in  the  middle.  Genes  shared 
by  ICMg  and  SSIM  modules  are  indicated  by  pink  and  light  blue  nodes,  and  ones  exclusively 
belong  to  ICMg,  SSIM  module  A  and  SSIM  module  B  are  depicted  by  white,  red  and  blue  nodes, 
respectively.  In  the  right  panel,  enriched  GO  BP  terms  (p<1x10-5)  and  their  uncorrected  p-values 
for  each  module  are  summarized  (adapted  from  Cho  et  al  201 1  BMC  Systems  Biology) 


Analysis  of  mRN A  transcripts  in  blood:  Several  reports  have  been  described  the 
possibility  of  using  the  levels  of  liver-specific  transcripts  in  plasma  as  biomarkers  to 
reflect  the  pathological  conditions  in  the  liver.  We  adapted  the  NextGen  sequencing 
technology  to  conduct  a  detailed  characterization  of  normal  plasma  RNA  spectrum  as 
well  as  the  effect  of  acetaminophen  overdose  on  the  circulating  RNAs.  This  would  allow 
us  to  explore  the  complexity  of  RNA  spectrum  in  plasma,  aid  our  understanding  on  the 
effect  of  acetaminophen  overdose,  and  identify  more  informative  biomarker  for  liver 
injury.  Because  of  the  low  RNA  concentration  in  the  plasma,  we  made  some 
modifications  in  the  sequencing  library  preparation  protocol.  With  the  modified  protocol, 
we  obtained  about  28  million  reads  from  each  sample,  after  trimmed  the  adaptors,  and 
removed  low  quality,  polyA  only  and  adaptor  only  sequences,  we  obtained  about  2  to  9 
millions  of  processed  reads  from  our  plasma  samples. 

Besides  the  significant  changes  on  miR-122  and  miR-192  levels,  similar  to  what  we 
reported  earlier  (Wang  et  al  2009  PNAS)  and  described  below,  we  also  observed  a 
significant  increase  of  several  liver-specific  mRNAs  including  albumin  (Alb),  ferritin 
(Fthi)  and  apolipoprotein  A2  (Apoa2)  (Figure  16).  We  are  in  the  process  of  conducting 
a  more  detailed  analysis  on  the  sequence  data  and  prepare  a  manuscript  to  report  the 
finding. 
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Figure  16.  The  changes  on  the  levels  of  several  liver  specific  genes  in  plasma.  The  Y-axis  is  the 
changes  on  the  number  of  reads  in  log  2  value  between  treated  vs.  control  and  individual  time 
points  F)ost  acetaminophen  exposure  are  indicated  on  the  X-axis.  The  gene  identity  is  listed  on 
the  right  of  the  figure. 


Analysis  ofmicroRNA:  MicroRNAs  (miRNAs)  are  19-23  nucleotides,  non-coding 
regulatory  RNAs  in  the  cells.  With  support  from  DOD,  we  also  pioneered  the  use  of 
miRNA  in  toxicology.  Using  the  acetaminophen  overdose  animal  model,  we 
demonstrated  that  levels  of  circulating  liver  enriched  microRNAs,  miR-122  and  miR-192 
are  far  more  sensitive  than  traditional  serological  aminotransferase  markers  (Figure  17). 


Figure  17.  microRNA  are  more  sensitive  markers  than  SGPT  for  liver  injury  Comparison 
between  the  levels  of  mir-122  (red  bars),  mir-192  (green  bars)  and  SGPT  (blue  line)  in  plasma 
samples  collected  from  mice  1  (A)  and  3  hours  (B)  after  exposed  to  different  doses  of 
acetaminophen  (indicated  on  X-axis).  The  relative  change  of  miRNA  expression  levels  (ratio  in 
log  2  compare  to  control)  is  indicated  on  the  left  side  of  the  figure  and  the  scale  of  SGPT  level  is 
on  the  right.  The  relative  change  of  miRNA  levels  are  expressed  in  log  2  ratio  of  each  treatment 
condition  compared  to  the  corresponding  control.  Values  of  miRNA  fold  change  and  SGPT  levels 
are  the  average  of  4  independent  samples  from  each  time  point  and  the  standard  derivations  are 
shown  as  error  bars,  (adapted  from  Wang  eta! 2009  PNASj 


This  finding  was  reported  in  a  publication  by  Wang  et  al  in  2009.  Most  of  the 
pharmaceutical  companies  are  now  using  the  level  of  miR-122  in  circulation  as  routine 
screen  in  their  therapeutics  development  pipeline. 


ALT  Value 
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To  broaden  the  utility  of  using  miRNA  as  biomarker,  we  also  profiled  the  miRNA 
composition  of  12  different  types  of  body  fluids  including  amniotic  fluid,  breast  milk, 
bronchial  lavage,  cerebrospinal  fluid,  colostrum,  peritoneal  fluid,  plasma,  pleural  fluid, 
saliva,  seminal  fluid,  tears,  and  urine  to  gain  a  better  understanding  on  the  distribution 
and  composition  of  miRNA  in  different  body  fluids.  (Figure  18).  Some  of  the  miRNAs  in 
urine  are  actually  associated  with  different  pathological  conditions.  This  suggested  the 
possibility  of  using  miRNA  as  biomarker  for  various  physio-pathological  conditions. 


Figure  18.  The  body  fluid  types  can  be  grouped  into  two  major  clusters  based  on  the  profile  of 
commonly  expressed  miRNAs.  Unsupervised  hierarchical  clustering  on  commonly  “expressed” 
miRNAs  groups  the  samples  into  two  major  groups.  Plasma  is  separated  from  the  two  major 
clusters  (adapted  from  Weber  etal.  2010  Clinic  Chem) 

The  ideal  biomarker  should  fit  a  number  of  criteria  depending  on  how  the  biomarker  is 
to  be  used  (Table  9,  adapted  from  Etheridge  et  al  2011  Mutation  Res).  It  should  be 
accessible  through  non-invasive  methods,  specific  to  the  disease  or  pathology  of 
interest,  a  reliable  indication  of  disease  before  clinical  symptoms  appear  (early 
detection),  sensitive  to  changes  in  the  pathology  (disease  progression  or  therapeutic 
response),  and  easily  translatable  from  model  systems  to  humans.  miRNAs  are  stable 
in  various  bodily  fluids,  the  sequences  of  most  miRNAs  are  conserved  among  different 
species,  the  expression  of  some  miRNAs  is  specific  to  tissues  or  biological  stages,  and 
the  level  of  miRNAs  can  be  easily  assessed  by  various  methods,  including  methods 
such  as  polymerase  chain  reaction  (PCR),  which  allows  for  signal  amplification. 


Specific 

Specific  to  diseased  organ  or  tissue 
Able  to  differentiate  pathologies 
Sensitive 

Rapid  and  significant  release  upon  the  development  of  pathology 
Predictive 

Long  half-life  in  sample 

Proportional  to  degree  of  severity  of  pathoiogy 
Robust 

Rapid,  simple,  accurate  and  inexpensive  detection 
Unconfounded  by  environment  and  unrelated  conditions 
Translatable 

Data  can  be  used  to  bridge  pre-clinical  and  clinical  results 
Non-invasive 

_ Present  in  accessible  fluid  sample _ 

Table  9.  Characteristics  of  an  ideal  biomarker. 
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It  has  been  shown  that  some  miRNAs  frequently  have  sequence  variations  termed 
isomirs.  To  better  understand  the  extent  of  miRNA  sequence  heterogeneity  and  its 
potential  implications  for  miRNA  function  and  measurement,  we  conducted  a 
comprehensive  survey  of  miRNA  sequence  variations  from  human  and  mouse  samples 
using  next  generation  sequencing  platforms.  Our  results  suggest  that  the  process  of 
generating  this  isomir  spectrum  might  not  be  random  and  that  heterogeneity  at  the  ends 
of  miRNA  affects  the  consistency  and  accuracy  of  miRNA  level  measurement.  In 
addition,  we  have  constructed  a  database  from  our  sequencing  data  that  catalogs  the 


entire  repertoire  of  miRNA  sequences  fhttD.//aalas.svstemsbioloav.net/cai- 
bin/isomir/find.ph  (Figure  19).  This  enables  users  to  determine  the  most  abundant 
sequence  and  the  degree  of  heterogeneity  for  each  individual  miRNA  species.  This 
information  will  be  useful  both  to  better  understand  the  functions  of  isomirs  and  to 
improve  probe  or  primer  design  for  miRNA  detection  and  measurement  (Lee  et  al  2010 


Figure  19.  A  screen  shot  of  isomir  database.  The  aligned  reads  and  corresponding  counts  are 
shown.  The  first  plot  shows  the  frequency  of  the  bases  and  the  second  plot  shows  the  frequency 
of  the  mature  miRNA  end  positions.  Sequences  that  match  perfectly  to  miRBase  sequences  are 
shown  in  pink  and  most  abundant  sequence  are  displayed  in  bold  (adapted  from  Lee  et  al  201 0 
RNA). 
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VX  studies 

Aim  5:  Analyze  time  course  experiments  of  rat  tissues  and  blood  exposed  to  VX. 

Yong  Zhou  and  Kai  Wang  reporting 

lntroduction:The  lethal  contact  nerve  agent,  VX  (S-(diisopropylaminoethyl) 
methylphosphonothiolate  o-ethyl  ester)  (Figure  20)  is  a  potent  organophosphate  that 
inhibits  acetylcholinesterase  (AChE),  an  enzyme  responsible  for  the  breakdown  of  the 
neurotransmitter  acetylcholine  (ACh).  The  accumulation  of  excessive  ACh  at  synapses 
causes  overstimulation  of  the  neuromuscular  junction  that  controls  smooth  muscle, 
cardiac  muscle,  and  exocrine  glandular  function.  VX  is  more  stable,  more  resistant  to 
detoxification,  more  efficient  at  skin  penetration,  and  more  environmentally  persistent 
compared  to  other  organo-phosphate  nerve  gas  compounds. 


Figure  20.  The  chemical  structure  of  VX.  The  space  filling  model  is  shown  on  the  left  while  the 
chemical  structure  is  on  the  right. 

Although  blood  cholinesterase  (ChE)  activity  has  been  used  as  an  indicator  of  nerve 
agent  exposure  or  as  an  index  of  recovery,  the  measurement  suffers  from  extensive 
measurement  background.  In  addition,  the  high  variability  of  normal  blood  ChE  activity 
makes  it  difficult  to  use  blood  ChE  activity  as  a  reliable  indicator  for  organophosphate 
exposure.  To  further  understand  the  mechanisms  of  VX  induced  toxicity  and  identify 
more  reliable  candidate  blood  biomarkers  of  VX  exposure  and  recovery,  since  2009, 
scientists  at  ISB  have  been  collaborating  with  Dr.  Jennifer  Sekowski  at  Edgewood 
Chemical  Biological  Center  (ECBC)  to  perform  systems  biological  studies  on  rat  tissues 
and  plasma  exposed  to  VX.  Overall,  we  received  samples  from  23  VX-treated  rats 
(subcutaneous  injection  at  80%  LDm  dosage,  12  pg/kg)  and  16  control  rats.  Time-course 
serum  samples  were  collected  before  treatment  (‘pre’)  and  at  2  hours,  24  hours,  48 
hours,  and  72  hours  post-exposure,  respectively,  while  brain  and  liver  samples  were 
also  collected  at  72  hours  post-exposure. 

Proteome  studies  of  rat  brain  tissue  exposed  with  VX  Nerve  Gas;  First,  quantitative 
global  proteomic  analyses  were  conducted  in  cerebellum  and  cerebrum  regions, 
respectively  by  quantifying  the  relative  abundances  of  proteome  in  4  VX-treated  and  4 
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control  rat  cerebellum/cerebrum  samples  through  labeling  tryptic  peptides  with  8-plex 
iTRAQ  reagents  and  conducting  LC-MS/MS  analyses. 

These  two  proteomic  experiments  revealed  relative  quantification  of  approximately 
1,000  proteins  in  both  brain  regions.  Interestingly,  only  two  proteins,  GFAP  (Glial 
fibrillary  acidic  protein)  and  SPTBN2  (spectrin,  beta,  non-erythrocytic  2)  showed 
statistically  significant  differences  (p<  0.05)  — a  reduction  in  VX-treated  cerebellum 
(Figures  21 B  and  22B)  but  not  in  cerebrum  (Figure  24).  Western  Blot  experiments  on  the 
same  rat  cerebellum  samples  confirmed  the  downtrend  of  GFAP  but  not  SPTBN2 
(Figures  21 A  and  22A). 

A.  C)  C2  C3  C4  VI  V2  V3  V4 


B. 


Sarrples  (C  control,  V  Vx-tr®ated) 


Figure  21 .  Abundance  of  GFAP  in  cerebellum  of  control  and  VX-treated  rats,  as  measured  by 
Western  Blot  (A)  and  iTRAQ-based  proteomic  analysis  (B),  respectively. 
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Figure  22.  Abundance  of  SPTBN2  in  cerebellum  of  control  and  VX -treated  rats,  as  measured  by 
Western  Blot  (A)  and  iTRAQ-based  proteomic  analysis  (B),  respectively. 
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Figure  23.  Differences  in  levels  of  GFAP  and  SPTBN2  in  cerebrum  samples  of  rats  treated  with 
saline  (control)  or  VX.  Error  bars  represent  standard  errors  among  biological  replicates  (n=4),  p  > 
0.4  for  both  GFAP  and  SPTBN2. 

Circulating  microHNAs  in  animals  exposed  with  VX  Nerve  Gas:  To  determine  whether 
circulating  miRNA  can  also  be  used  as  more  reliable  biomarkers  for  organophosphate 
exposure,  we  also  did  an  initial  survey  of  circulating  miRNA  spectra  from  time-course  VX 
exposed  rat  serum  (control,  2  hr,  24  hr,  48  hrs  and  72  hrs  post  exposure). 

The  amount  of  total  RNA  extracted  from  pooled  serum  samples  ranges  from  950  to 
1600  ug/ml  and  the  estimated  miRNA  population  ranges  from  460  to  1150  ug/ml  based 
on  the  results  from  Agilent  2100  Bioanalyzer  (Figure  25).  There  is  a  gradual  decrease  on 
the  amount  of  total  RNA  as  well  as  miRNA  in  the  blood  post  VX  exposure.  This  may 
indicate  severe  and  persist  injury  after  VX  exposure  after  48  hours;  however,  more 
detailed  pathological  information  is  needed  in  order  to  draw  any  conclusions  on  the 
decrease  of  RNA  levels.  There  is  a  sudden  decrease  of  RNA  and  miRNA  levels  2  hrs 
post  exposure,  which  may  have  resulted  from  acute  injuries  or  responses  from  the 
animals  toward  VX  exposure. 


Figure  24.  The  quality  of  quantity  of  RNA  isolated  from  VX  exposed  serum  samples.  The  image 
from  Bioanalyzer  2100  is  shown  on  the  left  the  RNA  (blue  bars)  and  estimated  microRNA  (yellow 
bars)  concentrations  are  shown  on  the  right.  The  RNA  and  miRNA  concentrations  were  obtained 
from  two  independent  measurements.  The  size  range  used  to  estimate  miRNA  concentration  is 
labeled  on  the  Bioanalyzer  image. 
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By  using  a  low-density  taqman  card,  we  have  identified  62  circulating  miRNAs  that  are 
detectable  in  all  the  samples  except  those  from  72  hrs  post  exposure.  Thirty-nine  of 
them  displayed  significant  changes  during  the  course  of  the  experimental  period  (Figure 
25).  Similar  as  the  concentrations  of  blood  total  RNA,  the  specific  miRNA  levels  also 
gradually  decreased  after  VX  exposure.  Several  miRNAs,  including  mir-128s,  mir-27b, 
mir-15b  and  mir-let7e,  showed  a  significant  decrease  during  the  first  two  hours  of  VX 
exposure.  These  may  have  great  potential  to  be  used  as  a  signature  for  VX  or  general 
organophosphate  exposure.  More  samples  with  different  organophosphate  compounds 
are  needed  in  order  to  determine  the  specificity  and  sensitivity  of  these  circulating 
miRNAs  for  VX  exposure. 
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The  most  affected  miRNA  species  from  VX  exposure  are  mir-let  7e  and  mir-133a. 

The  decrease  of  mir-133  is  especially  interesting  since  it  is  a  miRNA  that  is  highly 
enriched  in  muscles.  VX  has  a  well-known,  severe  effect  on  the  muscle,  which  may 
cause  the  decrease  of  mir-133a  (Fig  26A).  Unlike  the  mir-133a,  there  are  several 
miRNAs  that  showed  an  initial  decrease  followed  by  a  gradual  recovery,  such  as  the  mir- 
30s  (Fig  5-7B).  This  may  suggest  the  recovery  of  some  biological  functions  after  initial 
VX  exposure;  however,  more  pathological  information  is  needed  to  understand  the 
recovery  of  certain  miRNA  species  in  the  blood.  We  failed  to  detect  brain  specific 
miRNAs,  such  as  mir-124  in  the  blood,  which  may  indicate  that  the  VX  nerve  toxin  did 
not  cause  significant  structural  damage  on  the  CNS. 


Figure  25.  Hierarchical  cluster  analysis  of  39 
differentially  expressed  miRNA  in  VX  exposed 
serum  relative  to  control.  The  identities  of 
miRNAs  are  shown  on  the  right  and  the  samples 
are  indicated  on  the  top. 
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Figure  26.  Relative  miRNA  changes,  compared  to  controls,  in  the  experimental  period.  The  mir- 
133a  shows  a  gradual  decrease  (A)  while  the  mir-30s  (B)  regain  their  normal  levels  48  hours  after 
VX  exposure. 

The  changes  in  the  spectrum  of  circulating  microRNA  (miRNA)  in  blood  correlate  well 
with  various  physiopathological  conditions  including  cancers,  cardiovascular  conditions 
and  liver  injuries.  These  observations  clearly  suggest  that  the  spectrum  of  extracellular 
blood  miRNA  can  be  used  as  informative  biomarkers  to  monitor  the  body’s 
physiopathological  status.  Unlike  protein-based  markers,  detecting  specific  miRNA 
species  in  the  blood  is  a  much  easier  task  in  general. 

Quantitative  mapping  of  serum  proteome  in  VX  gas  exposed  rats:  Our  goals  of  serum 
proteome  profiling  include  1)  to  test  the  hypothesis  that  the  serum  proteome  contains 
signals  that  correlated  to  VX  exposure  over  time  course  (from  Oh  to  72h),  2)  to  identify 
putative  biomarkers  for  VX  exposure  and  recovery,  3)  to  further  understand  the 
mechanisms  and  targets  of  VX-induced  toxicity.  Considering  the  complexity  of  serum 
proteome,  high  dynamic  range  of  serum  proteins,  and  the  difficulties  in  analyzing 
proteomics  data  with  multiple  time  points,  we  adapted  an  approach  by  combining  affinity 
column  to  deplete  high  abundant  plasma  proteins  with  high  resolution  and  high 
sensitivity  mass  spectrometry  to  conduct  label-free  quantitative  proteomic  analyses.  In 
order  to  be  comparable  with  microRNA  profiling  data,  the  same,  pooled  serum  sample 
set  was  used  in  the  proteome  profiling  experiment. 

By  using  this  approach,  from  a  set  of  five  pooled  VX-treated  rat  sera  (15  pi  each),  we 
identified  3,189  unique  peptides  from  233  distinct  proteins  with  Protein  Prophet  cut-off 
score  of  >0.7  and  error  rate  <0.05.  The  average  peptide  sequence  coverage  is  about 
15.0%.  Although  the  majority  of  identified  proteins  are  well-known  highly  abundant 
serum  proteins,  a  number  of  tissues  derived  proteins  are  also  identified.  For  example, 
the  vesicle-associated  membrane  proteins  1  (Vampi),  dendrin  (Ndn),  periaxin  (Prx),  and 
solute  carrier  family  5  (choline  transporter)  (Slc5a7)  are  cytoplasm/membrane  proteins 
that  highly  expressed  in  brain. 

In  the  label-free  proteomics  analysis,  7,231  features  were  aligned  with  minimum  of  3 
LC-MS  runs.  The  scatter  plots  (Figure  27)  clearly  describe  that  hundreds  of  features  are 
either  up-  or  down-  presented  at  24 h  and  48h  post  VX  exposure.  Although  the  majority 
of  these  changes  are  not  statistically  significant  due  to  the  small  sample  size,  this 
phenomenon  is  still  encouraging.  Meanwhile,  considering  that  the  intensities  of  same 
features  in  duplicate  runs  at  the  same  time  point  are  more  consistent  comparing  to  that 
at  different  time  points,  the  variations  presenting  at  24h  and  48h  are  likely  to  be  from  the 
samples. 
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Figure  27:  Aligned  feature  intensity  comparisons  between  VX-treatment  time  points.  Scatter  plots 
are  either  between  two  replicates  at  the  same  time  point  (gray  shaded),  or  between  mean 
intensities  of  two  different  time  points  (off-diagonal) 

By  adapting  time-course  statistic  tools  like  EDGE  (Extraction  of  Different  Gene 
Expression),  we  identified  63  proteins  that  showing  significant  changes  on  their  levels  in 
blood  during  VX  exposure  (Table  10).  Interestingly,  22  of  them  have  been  reported 
previously  as  inflammatory  response-  and/or  acute-phase-  associated  plasma  proteins, 
which  further  suggest  the  involvement  of  inflammatory  processes  in  VX  induced  injury. 

For  example,  the  ITIH4  (inter-alpha-trypsin  inhibitor  heavy  chain  4)  has  been  identified 
as  an  acute-phase  protein  isolated  from  cattle  during  experimental  infection  with  a 
mixture  of  Actinomyces  pyogenes,  Fusobacterium  necrophorum,  and 
Peptostreptococcus  indolicus  (M.  Pineiro,  et  al.  Infection  and  Immunity,  2004,  Vol.  72,  p 
3777-3782). 

On  the  other  hand,  8  liver-specific  proteins  are  presenting  in  the  differential 
expressed  protein  list.  The  changes  of  these  liver  enriched  proteins  clearly  suggest  the 
involvement  of  liver  in  VX  induced  toxicity.  For  example,  carboxylesterase  1  precursor 
(CES1),  a  IK/er-specific  intracellular  membrane  protein  that  involves  in  the  detoxification 
of  xenobiotics  and  in  the  activation  of  ester  and  amide  prodrugs,  were  suppressed  after 
24  hours  of  VX  exposure,  which  may  indicate  close  association  of  CES1  in  VX 
metabolism. 
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Gene 

Symbol 

Protein  Description 

VXOh 

VX2h 

VX24h 

VX48h 

VX72h 

P- 

Vafue 

o- 

Vafue 

Inflamm 

atory/Ac 

ute 

phase  * 

Afm 

afamin 

0 

-0.1899 

-0.4208 

-0.2393 

-0.9850 

0.0118 

0.1169 

Ahsg 

alpha-2-HS-glycoprotein 

0 

0.0804 

0.5875 

0.5217 

0.2555 

0.0367 

0. 1386 

Yes 

Alb 

albumin 

0 

0.1133 

1.0945 

0.4803 

0.0170 

0.0004 

0.0157 

Apoal 

apolipoprotein  A-l 

0 

-0.2064 

-0.3466 

-0.1566 

•0.6569 

0.0379 

0. 1395 

Apoa4 

apolipoprotein  A-IV 

0 

0.1177 

-0.0527 

-0.1849 

•0.2471 

0.0170 

0.1169 

Apob 

apolipoprotein  B 

0 

-0.9853 

-0.9146 

-0.8583 

10.798 

3 

0.0000 

0.0006 

Apoe 

apolipoprotein  E 

0 

0.1555 

-0.4810 

-0.3931 

•0.6985 

0.0210 

0.1169 

Apoh 

apolipoprotein  H 

0 

0.5107 

0.9790 

0.7262 

0.7313 

0.0190 

0.1169 

Apom 

apolipoprotein  M 

0 

0.0107 

-0.2633 

-0.1616 

-0.2565 

0.0413 

0. 1452 

C2 

complement  component  2 

0 

0.0662 

0.6389 

0.7299 

0.5742 

0.0196 

0.1169 

Yes 

C3 

complement  component  3 

0 

0.0932 

0.2772 

-0.2338 

-0.4614 

0.0218 

0.1169 

Yes 

C9 

complement  component  9 

0 

0.4811 

1.1476 

0.6213 

0.7258 

0.0349 

0. 1356 

Yes 

Cfh 

complement  component 
factor  H 

0 

0.6312 

2.3729 

1.4048 

2.0601 

0.0049 

0. 1035 

Yes 

Cfi 

complement  factor  I 

0 

0.7813 

1.4797 

1.2300 

1.2613 

0.0119 

0.1169 

Yes 

Cfp 

complement  factor  properdin 

0 

-0.0144 

-0.2274 

-0.4097 

-0.6290 

0.0146 

0.1169 

Yes 

Clu 

clusterin 

0 

0.1142 

-0.4377 

-0.3554 

-0.5143 

0.0250 

0.1169 

Yes 

Cp 

ceruloplasmin 

0 

0.6883 

2.1067 

1.3146 

1.5143 

0.0011 

0.0329 

Cpn1 

carboxypeptidase  N,  catalytic 
chain 

0 

-0.1075 

-0.4720 

-0.2857 

-0.5282 

0.0241 

0.1169 

D3ZC54^ 

RAT 

Uncharacterized  protein 

0 

-0.1707 

-0.1245 

-5.4128 

10.486 

5 

0.0259 

0.1169 

D4AC77 

RAT 

Uncharacterized  protein 

0 

0.1481 

-0.2754 

-9.8188 

-1.1309 

0.0002 

0.0109 

EsI 

esterase  1 

0 

-0.1575 

-0.6346 

-0.4511 

-0.4119 

0.0037 

0.0932 

F2 

coagulation  factor  II 

0 

0.9298 

2.4570 

2.8775 

2.3656 

0.0444 

0. 1519 

Yes 

Fetub 

fetuin  beta 

0 

0.7428 

2.7018 

2.0964 

2.1106 

0.0002 

0.0109 

Fga 

fibrinogen,  alpha  polypeptide 

0 

0.3692 

1.7791 

0.9855 

1.0976 

0.0099 

0. 1 169 

Fgb 

fibrinogen,  B  beta  polypeptide 

0 

0.4264 

1.9615 

1.0351 

1.1902 

0.0186 

0. 1 169 

F99 

fibrinogen,  gamma  polypeptide 

0 

0.4821 

1 .5807 

0.5478 

1.1169 

0.0240 

0. 1 169 

Fnl 

fibronectin  1 

0 

0.5184 

1.4067 

0.2692 

1.0195 

0.0048 

0.1033 

Yes 

Gc 

group  specific  component 

0 

-0.0692 

-0.9088 

-0.4078 

-0.6824 

0.0160 

0.1169 

Gsn 

gelsolin 

0 

3.5089 

8.9027 

7.9854 

8.3943 

0.0206 

0.1169 

Hbb 

hemoglobin,  beta 

0 

-0.3067 

-0.4307 

-0.4960 

-1.8646 

0.0170 

0.1169 

Hp 

haptoglobin 

0 

0.2375 

0.8476 

0.1169 

0.2776 

0.0038 

0.0932 

Hps5: 

Saa4 

Hermansky-Pudlak  syndrome  5 

0 

0.5478 

2.8935 

-7.7999 

•2.8250 

0.0456 

0. 1542 

Hpx 

hemopexin 

0 

0.0924 

1.7479 

0.7231 

0.9171 

0.0140 

0.1169 

Hrg 

histidine-rich  glycoprotein 

0 

0.6258 

2.2868 

2.1702 

1.9479 

0.0004 

0.0181 

lgh-1a 

IgG  heavy  chain  1a  (serum 
lgG2a) 

0 

0.2443 

1.0688 

-0.1614 

0.5514 

0.0186 

0. 1 169 

Itih1 

inter-alpha  trypsin  inhibitor, 
heavy  chain  1 

0 

0.0239 

-0.2731 

-0.7616 

•0.1865 

0.0066 

0.1169 

Itih3 

inter-alpha  trypsin  inhibitor, 
heavy  chain  3 

0 

0.2360 

0.6870 

0.2759 

0.3053 

0.0366 

0. 1386 

Itih4 

inter  alpha-trypsin  inhibitor, 
heavy  chain  4 

0 

-0.1000 

-0.3499 

-0.1456 

-0.5120 

0.0240 

0.1169 

Yes 

Kng2 

kininogen  2 

0 

0.4059 

2.0760 

1.1404 

1.4665 

0.0063 

0.1169 

LOC2975 

68 

alpha- 1 -inhibitor  III 

0 

-0.1183 

-0.3144 

-0.1829 

•0.6539 

0.0001 

0.0068 

Yes 
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LOC2992 

82 

Serine  protease  inhibitor 

0 

-0.2560 

-0.4686 

-0.2345 

-0.5014 

0.0083 

0. 1 169 

LOC3605 

04 

hemoglobin  alpha  2  chain 

0 

-0.2225 

-0.3069 

-0.1715 

-1.1911 

0.0292 

0. 1218 

LOC4987 

93 

inter-alpha-inhibitor  H2  chain 

0 

-0.1179 

-0.1901 

-0.3442 

0.0429 

0.0235 

0. 1 169 

MGC108 

747 

similar  to  alpha-1  major  acute 
phase  protein  prepef^ide 

0 

0.2751 

0.1718 

-0.5375 

-0.3147 

0.0393 

0.1416 

Yes 

Mugl 

Murinoglobulin  1  homolog 

0 

-0.4060 

-1.3770 

-0.6357 

•1.3577 

0.0018 

0.0488 

Mug2 

munnoglobulin  2 

0 

0.0380 

1.6785 

1.7924 

1.4022 

0.0039 

0.0947 

Nfkbil2 

nuclear  factor  of  kappa  light 
polypeptide  gene  enhancer  in 
B-cells  inhibitor-like  2 

0 

0.9788 

1.8048 

1.9983 

1.7982 

0.0217 

0.1169 

Yes 

Ormi 

orosomucoid  1 

0 

0.0723 

2.1272 

0.7767 

1.2284 

0.0124 

0.1169 

Yes 

PI4 

platelet  factor  4 

0 

-0.1160 

0.0309 

-0.5258 

-0.7967 

0.0338 

0.1335 

Pig 

plasminogen 

0 

0.3908 

1.8879 

1.8390 

1.5486 

0.0049 

0.1035 

Pzp 

preg20000cy-zone  protein 

0 

-0.0047 

0.2190 

-0.9434 

-0.5315 

0.0187 

0.1169 

RGD130 

9019 

similar  to  Ras  GTPase- 
activating  protein  nGAP  (RAS 
protein  activator  like  1) 

0 

-0.4546 

-1.1836 

-0.2645 

-0.7542 

0.0291 

0.1216 

Serpinal 

serine  (or  cysteine)  proteinase 
inhibitor,  clade  A,  member  1 

0 

0.0539 

1.6327 

0.9244 

1.0056 

0.0245 

0.1169 

Yes 

Serpinal 

0 

serine  (or  cysteine)  peptidase 
inhibitor,  clade  A,  member  10 

0 

-0.5708 

10.094 

4 

-0.2797 

-5.3584 

0.0000 

0.0006 

Serpina3 

k 

serine  (or  cysteine)  peptidase 
inhibitor,  clade  A.  member  3K 

0 

-0.1347 

-0.3371 

-0.1116 

-0.5525 

0.0017 

0.0488 

Yes 

Serpina3 

m 

serine  (or  cysteine)  proteinase 
inhibitor,  clade  A,  member  3M 

0 

-0.2254 

-0.5507 

-0.3047 

-0.3596 

0.006P 

0. 1 169 

Yes 

SerpinaS 

n 

serine  (or  cysteine)  peptidase 
inhibitor,  clade  A,  member  3N 

0 

0.3991 

2.2099 

1.4240 

1.7527 

0.0058 

0.1156 

Yes 

Serpinae 

serine  (or  cysteine)  peptidase 
inhibitor,  clade  A,  member  6 

0 

-0.0028 

-0.5596 

-0.3196 

-0.5268 

0.0018 

0.0488 

Sefpinf2 

serine  (or  cysteine)  peptidase 
inhibitor,  clade  F,  member  2 

0 

0.9135 

2.2223 

1.6176 

1.8295 

0.0043 

0.0956 

Yes 

Serpingl 

serine  (or  cysteine)  peptidase 
inhibitor,  clade  G,  member  1 

0 

0.8368 

1.5576 

1.4403 

1.2841 

0.0344 

0.1348 

Yes 

Srprb 

signal  recognition  particle 
receptor,  B  subunit 

0 

-0.1772 

-0.4159 

-0.4854 

-0.9634 

0.0000 

0.0006 

Tf 

Serotransferrin 

0 

0.0891 

1.3489 

1.4019 

0.9879 

o.o;o5 

0.1169 

Yes 

Ttr 

transthyretin 

0 

-0.2240 

-0.4493 

-0.4808 

-0.6215 

0.0069 

0.1169 

Table  10.  List  of  63  proteins  that  differentially  expressed  in  VX-exposed  rat  serum  vs.  control 


Twenty-two  of  them  are  inflammatory  and  acute  phase  response  proteins  and  8  are  liver-specific 
proteins  (in  Bold).  Differentially  expressed  proteins  were  identified  by  EDGE  time-course  analysis 
mode.  Select  Q-Value  cut-off:  0.05,  equals  to  expected  false  discovery  rate:  3.0%. 

*:  Data  from  "Dissecting  the  human  plasma  proteome  and  inflammatory  response  biomarkers" 
{Chen,  et  al.,  Proteomics,  2009,  9,  470-484)  and  other  literatures. 


Summary:  Under  the  current  contract,  scientists  at  !SB  have  conducted  multiple 
systems  biological  analyses  in  time  course  experiments  of  rat  tissues  and  blood  exposed 
to  VX  and  observed  that: 

1.  A  proteomics  analysis  of  two  brain  regions  (cerebellum  and  cerebrum)  revealed  a 
reduction  in  cerebellum  glial  fibrillary  acidic  protein  (GFAP)  at  72  hrs. 
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2.  miRNA  analysis  demonstrated  an  elevated  level  of  mir-133a  in  VX-exposed  serum, 
which  is  preferentially  expressed  in  muscle  tissue. 

3.  Proteomic  analysis  of  the  blood  serum  unveiled  63  proteins  that  showing  significant 
changes  in  serum  during  VX  exposure.  Twenty-two  of  them  have  been  reported 
previously  as  inflammatory  response-  and/or  acute-phase-  associated  plasma  proteins. 
And  8  proteins  are  highly  expressed  in  liver. 

Although  the  integration  of  datasets  from  different  levels  of  systems  biology  analysis 
indicates  the  lack  of  VX  nerve  toxin  caused  significant  structural  damage  on  the  central 
nerve  system,  changes  of  liver-enriched  proteins  and  inflammatory  response-/  acute- 
phase-  associated  proteins  in  serum,  clearly  suggest  the  involvement  of  liver  and  acute- 
phase  inflammatory  processes  in  VX  induced  toxicity.  Also,  the  elevated  level  of  mir- 
133a  in  serum  nicely  demonstrates  VX-induced  damage  in  muscle  tissue. 

However,  more  information  is  needed  to  help  understand  the  molecular  affects  of  VX 
associated  toxicity.  For  example,  data  from  western  blots  on  individual  samples  from 
both  VX -treatment  and  control  groups  will  be  necessary  to  confirm  these  findings.  We 
have  been  tested  a  few  Abs  in  VX-treated  and  control  serum  samples,  but  they  are  all 
failed  due  to  either  the  low  specificity  of  Abs,  high  noise  background  from  serum,  or  the 
lower-than  detection  abundance  of  target  proteins  in  the  blood. 

Meanwhile,  a  list  of  candidate  serum  protein/miRNA  biomarkers  has  been  unveiled  in 
these  systems  biological  data  sets.  Many  of  them  may  be  worth  for  further  validation  in 
blood  as  potential  biomarkers  of  VX -exposure. 

Alternative  protein  detection  methods 

Aim  6:  Develop  new  technologies  for  developing  protein-capture  agents  and  the 
analyses  of  single  protein  molecules. 

Project  Aim  6  called  for  the  development  of  new  protein  biomarker  discovery  methods 
utilizing  surface  plasmon  resonance  (SPR)  detection  and  antibody  microarrays,  and  to 
apply  those  methods  to  the  study  of  in  vivo  and  cell  culture  models  of  hepatotoxicity. 
Chris  Lausted  reports: 

Cell  culture  analyses:  In  addition  to  studies  summarized  (Aims  2,3,4  section  C),  we 
attempted  to  utilize  in  vitro  models  to  study  the  effects  of  APAP  and  additional  toxic 
compounds.  Hepatocyte  secretions  (from  HepG2  and  other  liver  cell  lines)  were 
analyzed  using  antibody  microarrays  which  revealed  only  a  very  general  pattern.  The 
injured  cells  reduced  their  secretion  of  highly-produced  proteins  such  as  plasminogen 
and  fibrinogen,  but  no  elevated  liver-specific  proteins  were  observed.  Cells  were  treated 
with  toxic  levels  of  amiodarone,  nefazodone,  tamoxifen  and  troglitazone  as  well  as  non¬ 
toxic  controls.  Our  protocol  involved  the  incubation  of  liver  cells  for  24  hours  in  serum- 
free  DMEM,  media  removal,  and  0.2  pm  filtration  to  remove  dead  cells.  The  conditional 
media  were  concentrated  and  then  diluted  in  PBS  to  a  final  concentration  of  1 00  pg/ml. 

It  appears  low  levels  of  expression  of  cytochrome  P450  and  of  the  liver-specific  targets 
in  the  cell  lines  were  a  major  limitation  with  this  approach. 

Antibody  Development  Antibodies  are  key  materials  for  biomarker  research  as  they  are 
required  for  microarrays  and  immunoblots.  Furthermore,  ELISA  assays  require  pairs  of 
matched  antibodies  binding  to  separate  protein  epitopes.  /\s  antibodies  were  not 
available  for  half  of  our  liver-specific  targets,  we  worked  to  develop  a  system  to  produce 


* 


-44- 


high  quality  monoclonal  antibodies  and  antibody  pairs  via  hybridoma  technology. 
Typically,  hybridomas  are  selected  based  on  an  arbitrary  level  of  binding  to  the  target. 

As  we  ultimately  will  require  antibodies  for  use  with  blood  samples,  we  wish  to  prioritize 
the  specificity  oi  the  antibody  (where  specificity  is  the  difference  in  affinity  for  the  target 
protein  relative  to  all  other  serum  proteins). 

We  have  developed  a  new  method  to  evaluate  the  specificity  of  the  hybridoma 
antibodies  using  a  single  SPR  microarray.  The  method  involves  antibody  isotyping, 
quantification,  and  a  set  of  absolute  affinity  measurements.  The  microarray  contains 
antibodies  to  both  IgG  and  IgM  for  isotyping.  The  kinetics  of  these  antibodies  are 
measured  once  and  then  used  for  calculating  all  of  the  unknown  hybridoma 
concentrations.  The  microarray  contains  a  comprehensive  collection  of  liver-specific 
targets  as  well  as  a  number  of  abundant  serum  proteins,  and  total  human,  mouse,  and 
rat  serum  protein.  With  the  antibody  concentrations  calculated,  the  target  affinity  and  the 
off-target  affinities  can  all  be  determined  in  relative  or  absolute  units.  The  difference 
between  the  target  affinity  and  the  highest  off-target  affinity  provides  the  best  measure  of 
specificity.  The  most  specific  hybridoma  antibodies  were  chosen  for  full-scale 
productions.  This  procedure  is  both  economical  and  efficient,  requiring  no  labeling  and  a 
mere  20  pL  of  supernatent  for  quantitative  analysis. 
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Figure  28.  SPR  sensorgrams  determine  the  isotype  and  concentration  of  antibody  in  hybridoma 
supernatants.  Twelve  hybridoma  supernatants  raised  against  a  COMT-GST  recombinant  target 
all  show  binding  to  the  target  (blue  trace).  Eight  hybridomas  contain  IgM  antibodies  (green  trace 
is  higher  than  red  trace).  Four  hybridomas  contain  IgG  antibodies  (red  trace  is  higher  than  green 
trace).  Either  the  isotype-specific  binding  (red  and  green  traces),  or  the  isotype  independent 
binding  (black  trace)  may  be  used  to  determine  hybridoma  antibody  concentration. 


ELISA  Development  As  biomarkers  are  of  little  value  without  an  assay,  we  worked  to 
develop  ELISAs  for  each  candidate  biomarker.  Using  the  SPR  data,  hybridomas  were 
chosen  for  production  based  on  their  suitability  for  sandwich  assays.  The  resulting 
antibodies  were  used  to  develop  conventional  chemiluminescent  ELISAs.  We 
successfully  produced  assays  for  human  DPYS,  BHMT  (capture  with  monoclonal 
ISB030-4D1,  detection  with  ProteinTech  #212),  ALDOB  (capture  with  monoclonal 
ISB030-5G6,  detection  with  polyclonal),  ALDH1L1  (capture  with  monoclonal  ISB030- 
1C5,  detection  with  polyclonal),  PAH  (capture  with  monoclonal  ISB030-2A5,  detection 
with  polyclonal  036#1),  HA01  (capture  with  monoclonal  ISB030-1F3,  detection  with 
polyclonal),  MAT1A  (capture  with  monoclonal  ISB030-4A3,  detection  with  polyclonal 
036#1),  and  PIPOX  (capture  with  monoclonal  ISB030-3A4,  detection  with  polyclonal 
036#1), 
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Linearity  always  exceeded  two  orders  of  magnitude  with  typical  limits  of  quantitation 
in  the  low  nanogram  per  milliliter  range  using  250  ng  of  each  antibody  in  the  microtitre 
plate  format. 

As  panels  are  biomarkers  can  be  more  accurate  than  individual  biomarkers,  we  also 
worked  to  develop  a  multiplexed  ELISA  to  quantify  all  eight  of  these  liver-specific 
proteins  in  addition  to  a  dozen  cytokine  markers  of  inflammation.  Inflammation  in  an 
important  factor  in  drug  induced  liver  injury.  We  worked  to  develop  a  20-plex  protein 
panel  using  Nanostring  nCounter  technology.  Intended  for  RNA  measurement,  we 
developed  a  protein  version  of  the  assay  with  high  sensitivity  that  only  requires  5  pL  of 
serum  sample.  The  cytokine  portion  of  the  assay  was  successful,  resulting  in  sensitivity 
comparably  to  conventional  single-plex  ELISAs.  Cross-reactivity  between  the  liver- 
specific  proteins  was  observed,  reducing  sensitivity  to  a  level  inadequate  for  detecting 
liver  injury.  This  may  indicate  that  the  specificity  of  our  new  antibodies  is  lower  than  that 
of  the  cytokine  antibodies  used.  We  will  continue  to  work  towards  improving  and 
implementing  this  assay  through  other  sources  of  funding. 

SPR  Software:  Software  was  created  during  the  technology  development  portion  of  this 
research  has  been  made  public.  OSPRAI  is  the  open-source  software  project  at  ISB  for 
the  analysis  of  the  high-throughput  data  generated  by  Surface  Plasmon  Resonance 
Imaging.  OSPRAI  is  developed  and  used  by  liver  toxicity  project  members,  as  well  as 
others,  for  analyzing  the  antigen  arrays  used  in  antibody  development  and  for 
quantitative  proteomics  with  antibody  microarrays.  The  documentation  and  code 
repository  is  hosted  by  the  servers  at  the  Bioinformatics  Organization 
(http://www.bioinformatics.org/groups/?group_id=1018).  This  organization  hosts 
bioinformatics  collaborations  free-of-charge  for  academic  use.  Services  include  software 
version  control  (Subversion  SVN),  bug  tracking,  forums,  and  a  web  page.  OSPRAI 
includes  tools  for  converting  common  SPR  data  formats  (e.g.  Biacore,  Lumera,  CLAMP, 
spreadsheets),  signal  calibration,  outlier  removal,  and  kinetic  parameter  determination 
by  curve-fitting.  Prior  to  this,  no  such  software  has  been  available  for  SPR  microarray 
analysis. 
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